8. the Secret Confidence of Nature

Total Page:16

File Type:pdf, Size:1020Kb

8. the Secret Confidence of Nature

8. The Secret Confidence of Nature

8.1 Kepler, Napier, and the Third Law

There is special providence in the fall of a sparrow. Shakespeare

By the year 1605 Johannes Kepler, working with the relativistic/inertial view of the solar system suggested by Copernicus, had already discerned two important mathematical regularities in the orbital motions of the planets:

I. Planets move in ellipses with the Sun at one focus. II. The radius vector describes equal areas in equal times.

This shows the crucial role that interpretations and models sometimes play in the progress of science, because it's obvious that these profoundly important observations could never even have been formulated in terms of the Ptolemaic earth-centered model.

Oddly enough, Kepler arrived at these conclusions in reverse order, i.e., he first determined that the radius vector of a planet's "oval shaped" path sweeps out equal areas in equal times, and only subsequently determined that the "ovals" were actually ellipses. It's often been remarked that Kepler's ability to identify this precise shape from its analytic properties was partly due to the careful study of conic sections by the ancient Greeks, particularly Apollonius of Perga, even though this study was conducted before there was even any concept of planetary orbits. Kepler's first law is often cited as an example of how purely mathematical ideas (e.g., the geometrical properties of conic sections) can sometimes find significant applications in the descriptions of physical phenomena.

After painstakingly extracting the above two "laws" of planetary motion (first published in 1609) from the observational data of Tycho Brahe, there followed a period of more than twelve years during which Kepler exercised his ample imagination searching for any further patterns or regularities in the data. He seems to have been motivated by the idea that the orbits of the planets must satisfy a common set of simple mathematical relations, analogous to the mathematical relations which the Pythagoreans had discovered between harmonious musical tones. However, despite all his ingenious efforts during these years, he was unable to discern any significant new pattern beyond the two empirical laws which he had found in 1605. Then, as Kepler later recalled, on the 8th of March in the year 1618, something marvelous "appeared in my head". He suddenly realized that

III. The proportion between the periodic times of any two planets is precisely one and a half times the proportion of the mean distances.

In the form of a diagram, his insight looks like this:

At first it may seem surprising that it took a mathematically insightful man like Kepler over twelve years of intensive study to notice this simple linear relationship between the logarithms of the orbital periods and radii. In modern data analysis the log-log plot is a standard format for analyzing physical data. However, we should remember that logarithmic scales had not yet been invented in 1605. A more interesting question is why, after twelve years of struggle, this way of viewing the data suddenly "appeared in his head" early in 1618. (By the way, Kepler made some errors in the calculations in March, and decided the data didn't fit, but two months later, on May 15 the idea "came into his head" again, and this time he got the computations right.)

Is it just coincidental that John Napier's "Mirifici Logarithmorum Canonis Descripto" (published in 1614) was first seen by Kepler towards the end of the year 1616? We know that Kepler was immediately enthusiastic about logarithms, which is not surprising, considering the masses of computation involved in preparing the Rudolphine Tables. Indeed, he even wrote a book of his own on the subject in 1621. It's also interesting that Kepler initially described his "Third Law" in terms of a 1.5 ratio of proportions, exactly as it would appear in a log-log plot, rather than in the more familiar terms of squared periods and cubed distances. It seems as if a purely mathematical invention, namely logarithms, whose intent was simply to ease the burden of manual arithmetical computations, may have led directly to the discovery/formulation of an important physical law, i.e., Kepler's third law of planetary motion. (Ironically, Kepler's academic mentor, Michael Maestlin, chided him - perhaps in jest? - for even taking an interest in logarithms, remarking that "it is not seemly for a professor of mathematics to be childishly pleased about any shortening of the calculations".) By the 18th of May, 1618, Kepler had fully grasped the logarithmic pattern in the planetary orbits:

Now, because 18 months ago the first dawn, three months ago the broad daylight, but a very few days ago the full Sun of a most highly remarkable spectacle has risen, nothing holds me back.

It's interesting to compare this with Einstein's famous comment about "...years of anxious searching in the dark, with their intense longing, the final emergence into the light--only those who have experienced it can understand it".

Kepler announced his Third Law in Harmonices Mundi, published in 1619, and also included it in his "Ephemerides" of 1620. The latter was actually dedicated to Napier, who had died in 1617. The cover illustration showed one of Galileo's telescopes, the figure of an elliptical orbit, and an allegorical female (Nature?) crowned with a wreath consisting of the Naperian logarithm of half the radius of a circle. It has usually been supposed that this work was dedicated to Napier in gratitude for the "shortening of the calculations", but Kepler obviously recognized that it went deeper than this, i.e., that the Third Law is purely a logarithmic harmony. In a sense, logarithms played a role in Kepler's formulation of the Third Law analogous to the role of Apollonius' conics in his discovery of the First Law, and with the role that tensor analysis and Riemannian geometry played in Einstein's development of the field equations of general relativity. In each of these cases we could ask whether the mathematical structure provided the tool with which the scientist was able to describe some particular phenomenon, or whether the mathematical structure effectively selected an aspect of the phenomena for the scientist to discern.

Just as we can trace Kepler's Third Law of planetary motion back to Napier's invention of logarithms, we can also trace Napier's invention back to even earlier insights. It's no accident that logarithms have applications in the description of Nature. Indeed in his introduction to the tables, Napier wrote

A logarithmic table is a small table by the use of which we can obtain a knowledge of all geometrical dimensions and motions in space...

The reference to motions in space is very appropriate, because Napier originally conceived of his "artificial numbers" (later renamed logarithms, meaning number of the ratio) in purely kinematical terms. In fact, his idea can be expressed in a form that Zeno of Elea would have immediately recognized. Suppose two runners leave the starting gate, travelling at the same speed, and one of them maintains that speed, whereas the speed of the other drops in proportion to his distance from the finish line. The closer the second runner gets to the finish line, the slower he runs. Thus, although he is always moving forward, the second runner never reaches the finish line. As discussed in Section 3.7, this is exactly the kind of scenario that Zeno exploited to illustrate paradoxes of motion. Here, 2000 years later, we find Napier making very different use of it, creating a continuous mapping from the real numbers to his "artificial numbers". With an appropriate choice of units we can express the position x of the first runner as a function of time by x(t) = t, and the position X of the second runner is defined by the differential equation dX/dt = 1  X where the position "1" represents the finish line. The solution of this equation is X(t) = 1  et, where ex is the function that equals its own derivative. Then Napier defined x(t) as the "logarithm" of 1  X(t), which is to say, he defined t as the "logarithm" of et. Of course, the definition of logarithm was subsequently revised so that we now define t as the logarithm of et, the latter being the function that equals its own derivative.

The logarithm was one of many examples throughout history of ideas that were "in the air" at a certain time. It had been known since antiquity that the exponents of numbers in a geometric sequence are additive when terms are multiplied together, i.e., we have anam = a(m+n). In fact, there are ancient Babylonian tablets containing sequences of powers and problems involving the determination of the exponents of given numbers. In the 1540's Stifel's "Arithmetica integra" included tables of the successive powers of numbers, which was very suggestive for Napier and others searching for ways to reduce the labor involved in precise manual computations.

In the 1580's Viete derived several trigonometric formulas such as

If we have a table of cosine values this formula enables us to perform multiplication simply by means of addition. For example, to find the product of 0.7831 and 0.9348 we can set cos(x) = 0.7831 and cos(y) = 0.9348 and then look up the angles x,y with these cosines in the table. We find x = 0.67116 and y = 0.36310, from which we have the sum x+y = 1.03426 and the difference xy = 0.30806. The cosines of the sum and difference can then be looked up in the table, giving cos(x+y) = 0.51116 and cos(x-y) = 0.95292. Half the sum of these two numbers equals the product 0.73204 of the original two numbers. This technique was called prosthaphaeresis (the Greek word for addition and subtraction), and was quickly adopted by scientists such as the Dane Tycho Brahe for performing astronomical calculations. Of course, today we recognize that the above formula is just a disguised version of the simple exponent addition rule, noting that cos(x) = (eix + e-ix)/2.

At about this same time (1594), John Napier was inventing his logarithms, whose purpose was also to reduce multiplication and division to simple addition and subtraction by means of a suitable transformation. However, Napier might never have set aside his anti-Catholic polemics to work on producing his table of logarithms had it not been for an off-hand comment made by Dr. John Craig, who was the physician to James VI of Scotland (later James I of England and Ireland). In 1590 Craig accompanied James and his entourage bound for Norway to meet his prospective bride Anne, who was supposed to have journeyed from Denmark to Scotland the previous year, but had been diverted by a terrible storm and ended up in Norway. (The storm was so severe that several supposed witches were held responsible and were burned.) James' party, too, encountered severe weather, but eventually he met Anne in Oslo and the two were married. On the journey home the royal party visited Tycho Brahe's observatory on the island of Hven, and were entertained by the famous astronomer, well known as the discoverer of the "new star" in the constellation Cassiopeia. During this stay at Brahe's lavish Uraneinborg ("castle in the sky") Dr. Craig observed the technique of prosthaphaeresis that Brahe and his assistants used to ease the burden of calculation. When he returned to Scotland, Craig mentioned this to his friend the Baron of Murchiston (aka John Napier), and this seems to have motivated Napier to devote himself to the development of his logarithms and the generation of his tables, on which he spent the remaining 25 years of his life. During this time Napier occasionally sent preliminary results of Brahe for comment.

Several other people had similar ideas about exploiting the exponential mapping for purposes of computation. Indeed, Kepler's friend and assistant Jost Burgi evidently devised a set of "progress tables" (basically anti-logarithm tables) around 1600, based on the indices of geometric progressions, and made some use of these in his calculations. However, he didn't fully perceive the potential of this correspondence, and didn't develop it very far.

Incidentally, if the story of a group of storm-tossed nobles finding themselves on a mysterious island ruled over by a magician sounds familiar, it may be because of Shakespeare's "The Tempest", written in 1610. This was Shakespeare's last complete play and, along with Love's Labor's Lost, his only original plot, i.e., these are the only two of his plays whose plots are not known to have been based on pre-existing works. It is commonly believed that the plot of "The Tempest" was inspired by reports of a group of colonists bound for Virginia who were shipwrecked in Bermuda in 1609. However, it's also possible that Shakespeare had in mind the story of James VI (who by 1610 was James I, King of England) and his marriage expedition, arriving after a series of violent storms on the island of the Danish astronomer and astrologer Tycho Brahe and his castle in the sky (which, we may recall, included a menagerie of exotic animals). We know "The Tempest" was produced at the royal court in 1611 and again in 1612 as part of the festivities preceding the marriage of the King's daughter, and it certainly seems likely that James and Anne would associate any story involving a tempest with their memories of the great storms of 1589 and 1590 that delayed Anne's voyage to Scotland and prompted James' journey to meet her. The providential aspects of Shakespeare's "The Tempest" and its parallels with their own experiences could hardly have been lost on them.

Shakespeare's choice of the peculiar names Rosencrantz and Guildenstern for two minor characters in "Hamlet, Prince of Denmark" gives further support to the idea that he was familiar with Tycho, since those were the names of two of Tycho's ancestors appearing on his coat of arms. There is also evidence that Shakespeare was personally close to the Digges family (e.g., Leonard Digges contributed a sonnet to the first Folio), and Thomas Digges was an English astronomer and mathematician who, along with John Dee, was well acquainted with Tycho. Digges was an early supporter and interpreter of Copernicus' relativistic ideas, and was apparently the first to suggest that our Sun was just an ordinary star in an infinite universe of stars.

Considering all this, it is surely not too farfetched to suggest that Tycho may have been the model for Prospero, whose name, being composed of Providence and sparrow, is an example of Shakespeare's remarkable ability to weave a variety of ideas, influences, and connotations into the fabric of his plays, just as we can see in Kepler's three laws the synthesis of the heliocentric model of Copernicus, Apollonius' conics, and the logarithms of Napier. 8.2 Newton's Cosmological Queries

Isack received your letter and I perceived you letter from mee with your cloth but none to you your sisters present thai love to you with my motherly lov and prayers to god for you I your loving mother hanah wollstrup may the 6. 1665

Newton famously declared that it is not the business of science to make hypotheses. However, it's well to remember that this position was formulated in the midst of a bitter dispute with Robert Hooke, who had criticized Newton's writings on optics when they were first communicated to the Royal Society in the early 1670's. The essence of Newton's thesis was that white light is composed of a mixture of light of different elementary colors, ranging across the visible spectrum, which he had demonstrated by decomposing white light into its separate colors and then reassembling those components to produce white light again. However, in his description of the phenomena of color Newton originally included some remarks about his corpuscular conception of light (perhaps akin to the cogs and flywheels in terms of which James Maxwell was later to conceive of the phenomena of electromagnetism). Hooke interpreted the whole of Newton's optical work as an attempt to legitimize this corpuscular hypothesis, and countered with various objections.

Newton quickly realized his mistake in attaching his theory of colors to any particular hypothesis on the fundamental nature of light, and immediately back-tracked, arguing that his intent had been only to describe the observable phenomena, without regard to any hypotheses as to the cause of the phenomena. Hooke (and others) continued to criticize Newton's theory of colors by arguing against the corpuscular hypothesis, causing Newton to respond more and more angrily that he was making no hypothesis, he was describing the way things are, and not claiming to explain why they are. This was a bitter lesson for Newton and, in addition to initiating a life-long feud with Hooke, went a long way toward shaping Newton's rhetoric about what science should be.

I use the term "rhetoric" because it is to some extent a matter of semantics as to whether a descriptive theory entails a causative hypothesis. For example, when accused of invoking an occult phenomena in gravity, Newton replied that the phenomena of gravity are not occult, although the causes may be. (See below.) Clearly the dispute with Hooke had caused Newton to paint himself into the "hypotheses non fingo" corner, and this somewhat accidentally became part of his legacy to science, which has ever after been much more descriptive and less explanatory than, say, Descartes would have wished. This is particularly ironic in view of the fact that Newton personally entertained a great many bold hypotheses, including a number of semi-mystical hermetic explanations for all manner of things, not to mention his painstaking interpretations of biblical prophecies. Most of these he kept to himself, but when he finally got around to publishing his optical papers (after Hooke had died) he couldn't resist including a list of 31 "Queries" concerning the big cosmic issues that he had been too reticent to address publicly before. The true nature of these "queries" can immediately be gathered from the fact that every one of them is phrased in the form of a negative question, as in "Are not the Rays of Light very small bodies emitted from shining substances?" Each one is plainly a hypothesis phrased as a question.

The first edition of The Opticks (1704) contained only 16 queries, but when the Latin edition was published in 1706 Newton was emboldened to add seven more, which ultimately became Queries 25 through 31 when, in the second English edition, he added Queries 17 through 24. Of all these, one of the most intriguing is Query 28, which begins with the rhetorical question "Are not all Hypotheses erroneous in which Light is supposed to consist of Pression or Motion propagated through a fluid medium?" In this query Newton rejects the Cartesian idea of a material substance filling in and comprising the space between particles. Newton preferred an atomistic view, believing that all substances were comprised of hard impenetrable particles moving and interacting via innate forces in an empty space (as described further in Query 31). After listing several facts that make an aetheral medium inconsistent with observations, the discussion of Query 28 continues

And for rejecting such a medium, we have the authority of those the oldest and most celebrated philosophers of ancient Greece and Phoenicia, who made a vacuum and atoms and the gravity of atoms the first principles of their philosophy, tacitly attributing gravity to some other cause than dense matter. Later philosophers banish the consideration of such a cause... feigning [instead] hypotheses for explaining all things mechanically [But] the main business of natural philosophy is to argue from phenomena without feigning hypotheses, and to deduce causes from effects, till we come to the very first cause, which certainly is not mechanical.

And not only to unfold the mechanism of the world, but chiefly to resolve such questions as What is there in places empty of matter? and Whence is it that the sun and planets gravitate toward one another without dense matter between them? Whence is it that Nature doth nothing in vain? and Whence arises all that order and beauty which we see in the world? To what end are comets? and Whence is it that planets move all one and the same way in orbs concentrick, while comets move all manner of ways in orbs very excentrick? and What hinders the fixed stars from falling upon one another?

It's interesting to compare these comments of Newton with those of Socrates as recorded in Plato's Phaedo

If then one wished to know the cause of each thing, why it comes to be or perishes or exists, one had to find what was the best way for it to be, or to be acted upon, or to act. I was ready to find out ... about the sun and the moon and the other heavenly bodies, about their relative speed, their turnings, and whatever else happened to them, how it is best that each should act or be acted upon. I never thought [we would need to] bring in any other cause for them than that it was best for them to be as they are.

This wonderful hope was dashed as I went on reading, and saw that [men] mention as causes air and ether and water and many other strange things... It is what the majority appear to do, like people groping in the dark; they call it a cause, thus giving it a name which does not belong to it. That is why one man surrounds the earth with a vortex to make the heavens keep it in place, another makes the air support it like a wide lid. As for their capacity of being in the best place they could possibly be put, this they do not look for, nor do they believe it to have any divine force, but they believe that they will some time discover a stronger and more immortal Atlas to hold everything together...

Both men are suggesting that a hierarchy of mechanical causes cannot ultimately prove satisfactory, and that the first cause of things cannot be mechanistic in nature. Both suggest that the macroscopic mechanisms of the world are just manifestations of an underlying and irreducible principle of "order and beauty", indeed of a "divine force". But Newton wasn't content to leave it at this. After lengthy deliberations, and discussions with David Gregory, he decided to add the comment

Is not Infinite Space the Sensorium of a Being incorporeal, living and intelligent, who sees the things themselves intimately, and thoroughly perceives them, and comprehends them wholly by their immediate presence to himself?

Samuel Johnson once recommended a proof-reading technique to a young writer, telling him that you should read over your work carefully, and whenever you come across a phrase or passage that seems particularly fine, strike it out. Newton's literal identification of Infinite Space with the Sensorium of God may have been a candidate for that treatment, but it went to press anyway. However, as soon as the edition was released, Newton suddenly got cold feet, and realized that he'd exposed himself to ridicule. He desperately tried to recall the book and, failing that, he personally rounded up all the copies he could find, cut out the offending passage with scissors, and pasted in a new version. Hence the official versions contain the gentler statement (reverting once again to the negative question!):

And these things being rightly dispatch'd, does it not appear from phaenomena that there is a Being incorporeal, living, intelligent, omnipresent, who in infinite space, as it were in his Sensory, sees the things themselves intimately, and thoroughly perceives them, and comprehends them wholly by their immediate presence to himself: Of which things the images only carried through the organs of sense into our little sensoriums are there seen and beheld by that which in us perceives and thinks. And though every true step made in this philosophy brings us not immediately to the knowledge of the first cause, yet it brings us nearer to it...

Incidentally, despite Newton's efforts to prevent it, one of the un-repaired copies had already made its way out of the county, and was on its way to Leibniz, who predictably cited the original "Sensorium of God" comment as evidence that Newton "has little success with metaphysics".

Newton's 29th Query (not a hypothesis, mind you) was: "Are not the rays of light very small bodies emitted from shining substances?" Considering that his mooting of this idea over thirty years earlier had precipitated a controversy that nearly led him to a nervous breakdown, one has to say that Newton was nothing if not tenacious. This query also demonstrates how little his basic ideas about the nature of light had changed over the course of his life. After listing numerous reasons for suspecting that the answer to this question was Yes, Newton proceeded in Query 30 to ask the pregnant question "Are not gross bodies and light convertible into one another?" Following Newton's rhetorical device, should not this be interpreted as a suggestion of equivalence between mass and energy?

The final pages of The Opticks are devoted to Query 31, which begins

Have not the small particles of bodies certain powers, virtues, or forces, by which they act at a distance, not only upon the rays of light for reflecting, refracting, and inflecting them, but also upon one another for producing a great part of the Phenomena of nature?

Newton goes on to speculate that the force of electricity operates on very small scales to hold the parts of chemicals together and govern their interactions, anticipating the modern theory of chemistry. Most of this Query is devoted to an extensive (20 pages!) enumeration of chemical phenomena that Newton wished to cite in support of this view. He then returns to the behavior of macroscopic objects, asserting that

Nature will be very conformable to herself, and very simple, performing all the great motions of the heavenly bodies by the attraction of gravity which intercedes those bodies, and almost all the small ones of their particles by some other attractive and repelling powers which intercede the particles.

This is a very clear expression of Newton's belief that forces act between separate particles, i.e., at a distance. He continues

The Vis inertiae is a passive Principle by which Bodies persist in their Motion or Rest, receive Motion in proportion to the Force impressing it, and resist as much as they are resisted. By this Principle alone there never could have been any Motion in the World. Some other Principle was necessary for putting Bodies into Motion; and now they are in Motion, some other Principle is necessary for conserving the motion.

In other words, Newton is arguing that the principle of inertia, by itself, cannot account for the motion we observe in the world, because inertia only tends to preserve existing states of motion, and only uniform motion in a straight line. Thus we must account for the initial states of motion (the initial conditions), the persistence of non-inertial motions, and for the on-going variations in the amount of motion that are observed. For this purpose Newton distinguishes between "passive" attributes of bodies, such as inertia, and "active" attributes of bodies, such as gravity, and he points out that, were it not for gravity, the planets would not remain in their orbits, etc, so it is necessary for bodies to possess active as well as passive attributes, because otherwise everything would soon be diffuse and cold. Thus he is not saying that the planets would simply come to a halt in the absence of active attributes, but rather that the constituents of any physical universe resembling ours (containing persistent non-inertial motion) must necessarily possess active as well as passive properties.

Next, Newton argues that the "amount of motion" in the world is not constant, in two different respects. The first is rather interesting, because it makes very clear the fact that he regarded ontological motion as absolute. He considers two identical globes in empty space attached by a slender rod and revolving with angular speed  about their combined center of mass, and he says the center of mass is moving with some velocity v (in the plane of revolution). If the radius from the center of mass to each globe is r, then the globes have a speed of r relative to the center. When the connecting rod is periodically oriented perpendicular to the velocity of the center, one of the globes has a speed equal to v + r and the other a speed equal to v  r, so the total "amount of motion" (i.e., the sum of the magnitudes of the momentums) is simply 2mv. However, when the rod is periodically aligned parallel to the velocity of the center, the globes each have a total speed of , so the total "amount of motion" is

Thus, Newton argues, the total quantity of motion of the two globes fluctuates periodically between this value and 2mv. Obviously he is expressing the belief that the "amount of motion" has absolute significance. (He doesn't remark on the fact that the kinetic energy in this situation is conserved).

The other way in which, Newton argues, the amount of motion is not conserved is in inelastic collisions, such as when two masses of clay collide and the bodies stick together. Of course, even in this case the momentum vector is conserved, but again the sum of the magnitudes of the individual momentums is reduced. Also, in this case, the kinetic energy is dissipated as heat. Interestingly, Newton observes that, aside from the periodic fluctuations such as with the revolving globes, the net secular change in total "amount of motion" is always negative.

By reason of the tenacity of fluids, the attrition of their parts... motion is much more apt to be lost than got, and is always upon the Decay.

This can easily be seen as an early statement of statistical thermodynamics and the law of entropy. In any case, from this tendency for motion to decay, Newton concludes that eventually the Universe must "run down", and "all things would grow cold and freeze, and become inactive masses".

Newton also mentions one further sense in which (he believed) passive attributes alone were insufficient to account for the persistence of well-ordered motion that we observe.

...blind fate could never make all the planets move one and the same way in orbs concentrick, some inconsiderable irregularities excepted, which may have risen from the action of comets and planets upon one another, and which will be apt to increase, till this system wants a reformation.

In addition to whatever sense of design and/or purpose we may discern in the initial conditions of the solar system, Newton also seems to be hinting at the idea that, in the long run, any initial irregularities, however "inconsiderable" they may be, will increase until the system wants reformation. In recent years we've gained a better appreciation of the fact that Newton's laws, though strictly deterministic, are nevertheless potentially chaotic, so that the overall long-term course of events can quickly come to depend on arbitrarily slight variations in initial conditions, rendering the results unpredictable on the basis of any fixed level of precision.

So, for all these reasons, Newton argues that passive principles such as inertia cannot suffice to account for what we observe. We also require active principles, among which he includes gravity, electricity, and magnetism. Beyond this, Newton suggests that the ultimate "active principle" underlying all the order and beauty we find in the world, is God, who not only set things in motion, but from time to time must actively intervene to restore their motion. This was an important point for Newton, because he was genuinely concerned about the moral implications of a scientific theory that explained everything as the inevitable consequence of mechanical principles. This is why he labored so hard to reconcile his clockwork universe with an on-going active role for God. He seems to have found this role in the task of resisting an inevitable inclination of our mechanisms to descend into dissipation and veer into chaos.

In this final Query Newton also took the opportunity to explicitly defend his abstract principles such as inertia and gravity, which some critics charged were occult.

These principles I consider not as occult qualities...but as general laws of nature, by which the things themselves are formed, their truth appearing to us by phenomena, though their causes be not yet discovered. For these are manifest qualities, and their causes only are occult. The Aristotelians gave the name of occult qualities not to manifest qualities, but to such qualities only as they supposed to lie hid in Bodies, and to be the unknown causes of manifest effects, such as would be the causes of gravity... if we should suppose that these forces or actions arose from qualities unknown to us, and uncapable of being made known and manifest. Such occult qualities put a stop to the improvement of natural philosophy, and therefore of late years have been rejected. To tell us that every species of things is endowed with an occult specific quality by which it acts and produces manifest effects is to tell us nothing...

The last set of Queries to be added, now numbered 17 through 24, appeared in the second English edition in 1717, when Newton was 75. These are remarkable in that they argue for an aether permeating all of space - despite the fact that Queries 25 through 31 argue at length against the necessity for an aether, and those were hardly altered at all when Newton added the new Queries which advocate an aether. (It may be worth noting, however, that the reference to "empty space" in the original version of Query 28 was changed at some point to "nearly empty space".) It seems to be the general opinion among Newtonian scholars that these "Aether Queries" inserted by Newton in his old age were simply attempts "to placate critics by seeming retreats to more conventional positions". The word "seeming" is well chosen, because we find in Query 21 the comments

And so if any one should suppose that aether (like our air) may contain particles which endeavour to recede from one another (for I do not know what this aether is), and that its particles are exceedingly smaller than those of air, or even than those of light, the exceeding smallness of its particles may contribute to the greatness of the force by which those particles may recede from one another, and thereby make that medium exceedingly more rare and elastick than air, and by consequence exceedingly less able to resist the motions of projectiles, and exceedingly more able to press upon gross bodies, by endeavoring to expand itself.

Thus Newton not only continues to view light as consisting of particles, but imagines that the putative aether may also be composed of particles, between which primitive forces operate to govern their movements. It seems that the aether of these queries was a distinctly Newtonian one, and it purpose was as much to serve as a possible mechanism for gravity as for the refraction and reflection of light. It's disconcerting that Newton continued to be misled by his erroneous belief that refracted paths proceed from more dense to less dense regions, which required him to posit an aether surrounding the Sun with a density that increases with distance, so that the motion of the planets may be seen as a tendency to veer toward less dense parts of the aether.

There's a striking parallel between this set of "pro-Aether Queries" of Newton and the famous essay "Ether and the Theory of Relativity", in which Einstein tried to reconcile his view of physics with something that could be termed an ether. Of course, it turned out to be a distinctly Einsteinian ether, immaterial, and incapable of being assigned any place or state of motion.

Since I've credited Newton with suggesting the second law of thermodynamics and mass- energy equivalence, I may as well mention that he could also be regarded as the originator of the notorious "cosmological constant", which has had such a checkered history in theory of relativity. Recall that the weak/slow limit of Einstein's field equations without the cosmological term corresponds to a gravitational relation of the familiar form

but if a non-zero cosmological constant is assumed the weak/slow limit is

As it happens, Newton explored the consequences of a wide range of central force laws in the Principia, and determined that the only two forms for which spherically symmetrical masses can be treated as if all the mass was located at the central point are F = k/r2 and F = r. (See Propositions LXXVII and LXXVIII in Book I). In addition to this distinctive spherical symmetry property (analogous to Birkhoff's theorem for general relativity), these are also the only two central force laws for which the shape of orbits in a two-body system are perfect conic sections (see Proposition X), although in the case of a force directly proportional to the distance the center of force is at the center of the conic, rather than at a focus. In the Scholium following the discussion of spherically symmetrical bodies Newton wrote

I have now explained the two principal cases of attractions; to wit, when the centripetal forces decrease as the square of the ratio of the distances, or increase in a simple ratio of the distances, causing the bodies in both cases to revolve in conic sections, and composing spherical bodies whose centripetal forces observe the same law of increase or decrease in the recess from the center as the forces from the particles themselves do; which is very remarkable.

Considering that Newton referred to these two special cases as the two principal cases of "attraction", it's not too much of a stretch to say that the full general law of attraction (or gravitation) developed in the Principia was actually (2) rather than (1), and it was only in Book III (The System of the World), in which the laws are fit to actual observed phenomena, that he concludes there is no (discernable) evidence for the direct term. The situation is essentially the same today, i.e., on a purely formal mathematical basis the cosmological term seems to "fit", at least up to a point, but the empirical justification for it remain unclear. If  is non-zero, it must be quite small, at least in the current epoch. So I think it can be said with some justification that Newton actually originated the cosmological term in theoretical investigations of gravity.

As an example of how seriously Newton took these "non-physical" possibilities, he noted that with an inverse-square law the introduction of a third body generally destroys perfect ellipticity of the orbits, causing the ellipses to precess, whereas in Proposition LXIV he shows that with a pure direct force law F = r this is not the case. In other words, the orbits remain perfectly elliptical even with three or more gravitating bodies, although the presence of more bodies increases the velocities and decreases the periods of the orbits.

These serious considerations show that Newton wasn't simply trying to fit data to a model. He was interested in the same aspect of science that Einstein said interested him the most, namely, "whether God had any choice in how he created the world". This may be a somewhat melodramatic way of expressing it, but the basic idea is clear. It isn't enough to discern that objects appear to obey an inverse square law of attraction; Newton wanted to understand what was special about the inverse square, and why nature chose that form rather than some other. Socrates alluded to this same wish in Phaedo:

If then one wished to know the cause of each thing, why it comes to be or perishes or exists, one had to find out what was the best way for it to be, or to be acted upon, or to act.

Although this attitude may strike us as silly, it seems undeniable that it's been an animating factor in the minds of some of the greatest scientists – the urge to comprehend not just what is, but why it must be so.

8.3 The Helen of Geometers

I first have to learn to watch very respectfully as the masters of creativity perform their intellectual climbing feats, while I stay bowleggedly below in the valley mist. I already have a premonition that up there the sun is always shining! Hedwig Born to Einstein, 1919

The curve traced out by a point on the rim of a rolling circle is called a cycloid, and we've seen that this curve described gravitational free-fall, both in Newtonian mechanics and in general relativity (in terms of the free-falling proper time). Remarkably, this curve has been a significant object of study for almost every major scientist mentioned in this book, and has been called "the Helen of geometers" because of all the disputes it has provoked between mathematicians. It was first discussed by Charles Bouvelles in 1501 as a mechanical means of squaring the circle. Subsequently Galileo and his student Viviani studied the curve, finding a method of constructing tangents, and Galileo suggested that it might be a suitable shape for an arch bridge.

Mersenne publicized the cycloid among his group of correspondents, including the young Roberval, who, by the 1630's had determined many of the major properties of the cycloid, such as the interesting fact that the area under a complete cycloidal arch is exactly three times the area of the rolling circle. Roberval used his problem-solving techniques in 1634 to win the Mathematics chair at the College Royal, which was determined every three years by an open competition. Unfortunately, the contest did not require full disclosure of the solution methods, so the incumbent (who selected the contest problems) had a strong incentive to keep his best methods a secret, lest they be used to unseat him at the next contest. In retrospect, this was not a very wise arrangement for a teaching position. Roberval held the chair for 40 years, but by keeping his solution methods secret he lost priority for several important discoveries, and became involved in numerous quarrels. One of the men accused by Roberval of plagiarism was Torricelli, who in 1644 was the first to publish an explanation of the area and the tangents of the cycloid. It's now believed that Torricelli arrived at his results independently. (Torricelli served as Galileo's assistant for a brief time, and probably learned of the cycloid from him.)

In 1658, four years after renouncing mathematics as a vainglorious pursuit, Pascal found himself one day suffering from a painful toothache, and in desperation began to think about the cycloid to take his mind off the pain. Quickly the pain abated, and Pascal interpreted this as a sign from the Almighty that he should proceed to study the cycloid, which he did intensively for the next eight days. During this period he rediscovered most of what had already been learned about the cycloid, and several results that were new. Pascal decided to propose a set of challenge problems, with the promise of a first and second prize to be awarded for the best solutions. Roberval was named as one of the judges. Only two sets of solutions were received, from Antoine de Lalouvere and John Wallis, but Pascal and Roberval decided that neither of the entries merited a prize, so no prizes were awarded. Instead, Pascal published his own solutions, along with an essay on the "History of the Cycloid", in which he essentially took Roberval's side in the priority dispute with Torricelli.

The conduct of Pascal's cycloid contest displeased many people, but it had at least one useful side effect. In 1658 Christiaan Huygens was thinking about how to improve the design clocks, and of course he realized that the period of oscillation of a simple pendulum (i.e., a massive object constrained to moving along a circular arc under the vertical force of gravity) is not perfectly independent of the amplitude. Prompted by Pascal's contest, Huygens decided to consider how an object would oscillate if constrained to follow an upside-down cycloidal path, and found to his delight that the frequency of such a system actually is perfectly independent of the amplitude. Thus he had discovered that the cycloid is the tautochrone, i.e., the curve for which the time taken by a particle sliding from any point on the curve to the lowest point on the curve is the same, independent of the starting point. He presented this result in his great treatise "Horologium Oscillatorium" (not published until 1673), in which he clearly described the modern principle of inertia (the foundation of relativity), the law of centripetal force, the conservation of kinetic energy, and many other important concepts of dynamics - ten years before Newton's "Principia".

The cycloid went on attracting the attention of the world's best mathematicians, and revealing new and remarkable properties. For example, in June of 1696, John Bernoulli issued the following challenge to the other mathematicians of Europe:

If two points A and B are given in a vertical plane, to assign to a mobile particle M the path AMB along which, descending under its own weight, it passes from the point A to the point B in the briefest time.

Pictorially the problem is as shown below:

In accord with its defining property, the requested curve is called the brachistochrone. The solution was first found by Jean and/or Jacques Bernoulli, depending on whom you believe. (Each of the brothers worked on the problem, and they later accused each other of plagiarism.) Jean, who was never accused of understating the significance of his discoveries, revealed his solution in January of 1697 by first reminding his readers of Huygens' tautochrone, and then saying "you will be petrified with astonishment when I say that precisely this same cycloid... is our required brachistochrone".

Incidentally, the Bernoulli's were partisans on the side of Leibniz in the famous priority dispute between Leibniz and Newton over the invention of calculus. Before revealing his solution to the brachistochrone challenge problem, Jean Bernoulli along with Leibniz sent a copy of the challenge directly to Newton in England, and included in the public announcement of the challenge the words

...there are fewer who are likely to solve our excellent problems, aye, fewer even among the very mathematicians who boast that [they]... have wonderfully extended its bounds by means of the golden theorems which (they thought) were known to no one, but which in fact had long previously been published by others.

It seems clear the intent was to humiliate the aging Newton (who by then had left Cambridge and was Warden of the Mint), by demonstrating that he was unable to solve a problem that Leibniz and the Bernoullis had solved. The story as recounted by Newton's biographer Conduitt is that Sir Isaac "in the midst of the hurry of the great recoinage did not come home till four from the Tower very much tired, but did not sleep till he had solved it, which was by 4 in the morning." In all, Bernoulli received only three solutions to his challenge problem, one from Leibniz, one from l'Hospital, and one anonymous solution from England. Bernoulli supposedly said he knew who the anonymous author must be, "as the lion is recognized by his print". Newton was obviously proud of his solution, although he commented later that "I do not love to be dunned & teezed by forreigners about Mathematical things..."

It's interesting that Jean Bernoulli apparently arrived at his result from his studies of the path of a light ray through a non-uniform medium. He showed how this problem is related in general to the mechanical problem of an object moving with varying speeds due to any cause. For example, he compared the refractive problem with the mechanical problem whose density is inversely proportional to the speed that a heavy body acquires in gravitational freefall. "In this way", he wrote, "I have solved two important problems - an optical and a mechanical one...". Then he specialized this to Galileo's law of falling bodies, according to which the speeds of two falling bodies are to each other as the square roots of the altitudes traveled. He concluded

Before I end I must voice once more the admiration I feel for the unexpected identity of Huygens' tautochrone and my brachistochrone. I consider it especially remarkable that this coincidence can take place only under the hypothesis of Galileo, so that we even obtain from this a proof of its correctness. Nature always tends to act in the simplest way, and so it here lets one curve serve two different functions, while under any other hypothesis we should need two curves...

Presumably his enthusiasm would have been even greater had he known that the same curve describes radial gravitational freefall versus proper time in general relativity. We see from Bernoulli’s work that the variational techniques developed to solve problems like the brachistrochrone also found physical application in what came to be called the principle of least action, a principle usually attributed to Maupertius, or perhaps Leibniz (if one accepts the contention that “the best of all possible worlds” represents an expression of this principle). One particularly striking application of this variational approach was Fermat’s principle of least time for light rays, as discussed in Section 3.4. Essentially the same technique is used to determine the equations of a geodesic path in the curved spacetime of general relativity.

In the twentieth century, Planck was the most prominent enthusiast for the variational approach, asserting that “the principle of least action is perhaps that which, as regards form and content, may claim to come nearest to that ideal final aim of theoretical research”. Indeed he even (at times) argued that the principle manifests a deep teleological aspect of nature, since it can be interpreted as a global imperative, i.e., systems evolve locally in a way that extremizes (or makes stationary) certain global measures in a temporally symmetrical way, as if the final state were already determined. He wrote

In fact, the least-action principle introduces an entirely new idea into the concept of causality: The causa efficiens, which operates from the present into the future and makes future situations appear as determined by earlier ones, is joined by the causa finalis for which, inversely, the future – namely, a definite goal – serves as the premise from which there can be deduced the development of the processes which lead to this goal.

It’s surprising to see this called “an entirely new idea”, considering that causa finalis was among the four fundamental kinds of causation enunciated by Aristotle. In any case, throughout his life the normally austere and conservative Planck continued to have an almost mystical reverence for the principle of least action, arguing that it is not only “the most comprehensive of all physical laws”, but that it actually represents the purest expression of the thoughts of God.

Interestingly, Fermat himself was much less philosophically committed to the principle that he himself originated (somewhat like Einstein’s ambivalence toward the quantum theory). After being challenged on the fundamental truth of the "least time" principle as a law of nature by the Cartesian Clerselier, Fermat replied in exasperation

I do not pretend and I have never pretended to be in the secret confidence of nature. She moves by paths obscure and hidden...

Fermat was content to regard the principle of least time as a purely abstract mathematical theorem, describing - though not necessarily explaining – the behavior of light.

8.4 Refractions on Relativity

For now we see through a glass, darkly; but then face to face. Now I know in part, but then shall I know even as also I am known. I Corinthians 13,12

We saw in Section 3.4 that Fermat's Principle of least time predicts that paths of light rays passing through a plane boundary between regions of constant refractive index, but to more fully appreciate this principle it's useful to develop the equations of motion for light rays in a medium with arbitrarily varying refractive index. First, notice that Snell's law enables us to determine the paths of optical rays passing though a discrete boundary between regions of constant refractive index, but doesn't explicitly tell us the path of light in a medium of continuously varying refractivity. To determine this, we can refer to Fresnel's equations, which give the intensities of the reflected and transmitted

Consequently, the fraction of incident energy that is transmitted is 1  R. However, this formula assumes the thickness of the boundaries between regions of constant refractive index is small in comparison with the wavelength of the light, whereas in many real circumstances the density of the medium does not change abruptly at well-defined boundaries, but varies continuously as a function of position. Therefore, we would like a means of tracing rays of light as they pass through a medium with a continuously varying index of refraction.

Notice that if we approximate a continuously changing index of refraction by a sequence of thin uniform plates, as we add more plates the ratio of n2/n1 from one region to the next approaches 1, and so according to Snell's Law the value of 2 approaches the value of 1. From Fresnel's equations we see that in this case the fraction of incident energy that is reflected goes to zero, and we find that a light ray with a given trajectory proceeds in just one direction through the continuous medium (provided the gradient of the scalar field n(x,y) is never too great relative to the wavelength of the light). So, it should be possible to predict the unique path of transmission of a light ray in a medium with continuously varying index of refraction.

Perhaps the most direct approach is via the usual calculus of variations. (For convenience we'll just work in 2 dimensions, but all the formulas can immediately be generalized to three dimensions.) We know that the index of refraction n at a point (x,y) equals c/v, where v is the velocity of light at that point. Thus, if we parameterize the path by the equations x = x(u) and y = y(u), the "optical path length" from point A to point B (i.e., the time taken by a light beam to traverse the path) is given by the integral

where dots signify derivatives with respect to the parameter u. To make this integral an extremum, let f denote the integrand function

Then the Euler equations (introduced in Section 5.4) are

which gives

Now, if we define our parameter u as the spatial path length s, then we have , and so the above equations reduce to

These are the "equations of motion" for a photon in a heterogeneous medium, as they are usually formulated, in terms of the spatial path parameter s. However, another approach to this problem is to define a temporal metric on the space, i.e., a metric the represents the time taken by a light beam to travel from one point to another. This temporal approach has remarkable formal similarities to Einstein's metrical theory of gravity.

According to Fermat's Principle, the path taken by a ray of light from one point to another is such that the time is minimal (for slight perturbations of the path). Therefore, if we define a metric in the x,y space such that the metrical "distance" between any two infinitesimally close points is proportional to the time required by a photon to travel from one point to the other, then the paths of photons in this space will correspond to the geodesics.

Since the refractive index n is a smooth continuous function of x and y, it can be regarded as constant in a sufficiently small region surrounding any particular point (x,y). The incremental spatial distance from this point to the nearby point (x+dx, y+dy) is given by ds2 = dx2 + dy2, and the incremental time d for a photon to travel the incremental distance ds is simply ds/v where v = c/n. Therefore, we have d = (n/c)ds, and so our metrical line element for this space is

If, instead of x and y, we name our two spatial coordinates x1 and x2 (where these superscripts denote indices, not exponents) we can express equation (2) in tensor form as

where guv is the covariant metric tensor

Note that in equation (3) we have invoked the usual summation convention. The contravariant form of the metric tensor, denoted by guv, is the matrix inverse of (4).

According to Fermat's Principle, the path of a light ray must be a geodesic path based on this metric. As discussed in Section 5.4, the equations of a geodesic path are

Based on the metric of our 2D optical space we have the eight Christoffel symbols

Inserting these into (5) gives the equations for geodesic paths, which define the paths of light rays in this region. Reverting back to our original notation of x,y for our spatial coordinates, the differential equations for ray paths in this medium of continuously varying refractive index are

where nx and ny denote partials derivatives of n with respect to x and y respectively. These are the equations of motion for light based on the temporal metric approach.

To show that these equations, based on the temporal path parameter , are equivalent to equations (1a) and (1b) based on the spatial path parameter s, notice that s and  are linked by the relation ds/d = c/n where c is the velocity of light. Multiplying both inside and outside the right hand side expression of (1a) by the unity of (n/c)(ds/d) we get

Expanding the derivative on the right side gives

Since n is a function of x and y, we can express the derivative dn/d using the total derivative

Substituting this into the previous equation and factoring gives

Recalling that c/n = ds/d, we can multiply both sides of this equation by (ds/d)2 to give

Since s is the spatial path length, we have (ds)2 = (dx)2 + (dy)2, so we can substitute for ds on the left hand side and rearrange terms to give the result

which is the same as the geodesic equation (6a). A similar derivation shows that (1b) is equivalent to the geodesic equation (6b), so the two sets of equations of motion for light rays are identical.

With these equations we can compute the locus of rays emanating from any given point in a medium with arbitrarily varying index of refraction. Of course, if the index of refraction is constant then the right hand sides of equations (6) vanish and the equations for light rays reduce to

which are simply the equations of straight lines. For a less trivial case, suppose the index of refraction in this region is a linear function of the x parameter, i.e., we have n(x) = Ax + B for some constants A and B. In this case the equations of motion reduce to

With A=5 and B=1/5 the locus of rays emanating from a point is as shown in Figure 1.

Figure 1

The correctness of the rays in Figure 1 are easily verified by noting that in a medium with n varying only in the horizontal direction it follows immediately from Snell's law that the product n sin() must be constant, where  is the angle which the ray makes with the horizontal axis. We can verify numerically that the rays shown in Figure 1, generated by the geodesic equations, satisfy Snell's Law throughout.

We've placed the origin of these rays at the location where n = 5. The left-most point on this family of curves emanating from that point is at the x location where n = 0. Of course, in reality we could not construct a medium with n = 0, since that represents an infinite speed of light. It is, however, possible for the index of refraction of a medium to be less than 1 for certain frequencies, such as x-rays in glass. This implies that the velocity of light exceeds c, which may seem to conflict with relativity. However, the "velocity of light" that appears in the denominator of the refractive index is actually the phase velocity, rather than the group velocity, and the latter is typically the speed of energy transfer and signal propagation. (The phenomenon of "anomalous dispersion" can actually result in a group velocity greater than c, but in all cases the signal velocity is less than or equal to c.)

Incidentally, these ray lines, in a medium with linearly varying index of refraction, are called catenary curves, which is the shape made by a heavy cable slung between two attachment points in uniform gravity. To prove this, let's first rotate the medium so that the refractive index varies vertically instead of horizontally, and let's slide the vertical axis so that n = Ay for some constant A. The general form of a catenary curve (with vertical axis of symmetry) is

for some constant m. It follows that dy/dx = sinh(x/m). Also, the incremental distance along the path is given by (ds)2 = (dx)2 + (dy)2, so we can substitute for dy to give

Therefore, we have ds = cosh(x/m) dx, which can be integrated to give s = sinh(x/m). Interestingly, this implies that dy/dx = s, so the slope of a catenary (with vertical axis) equals the distance along the curve from the minimum point. Also, from the relation x = m invsin(s) we have dx/ds = m / , so we can multiply this by dy/dx = s to give dy/ds = as/ . Integrating this gives y as a function of s, so we have the parametric equations

Letting n0 denote the index of refraction at the minimum point of the catenary (where the curve is parallel to the lines of constant refractive index), and letting A denote dn/dy, we have m = n0/A. For other values of y we have n = Ay = n0 . We can verify that the catenary represents the path of a light ray in a medium whose index of refraction varies linearly as a function of y by inserting these expressions for x, y, and n (and their derivatives) into equations of motion (1).

The surface of revolution of one of these catenary curves about the vertical axis through the vertex of the envelope is called a catenoid. Each point inside the envelope of this family of curves is contained in exactly two curves, and the catenoid given by the shorter of these two curves is a minimal surface. It's also interesting to note that the "envelope" of rays emanating from a given point approaches a parabola whose focus is the given point. This parabola and focus are shown as a dotted line in Figure 1.

For a less trivial example, the figure below shows the rays in a medium where the index of refraction is spherically symmetrical and drops off linearly with distance from some central point, which gives ray paths that are hypocycloidal loops.

Figure 2

It's also possible to arrange for the light rays to be loxodromic spirals, as shown below.

Figure 3

Finally, Figure 4 shows that the rays can circulate from one point to a central point in accord with "circles of Apollonius", much like the iterations of Mobius transformations in the complex plane.

Figure 4

This occurs with n varying inversely as the square of the distance from the central point. Theoretically, the light from any point, with an initial trajectory in any direction, will eventually turn around and head toward the singularity of infinite density at the center, which the ray approaches asymptotically slowly. Thus, it might be called a "black sphere" lens that refracts all incident light toward its center. Of course, there are obvious practical difficulties with actually constructing an object like this, not least of which is the infinite density at the center, as well as the problems of reflection and dispersion.

As an aside, it's interesting to compare the light deflection predicted by the Schwarzschild solution with the deflection that would be given by a simple "refractive medium" with a scalar index of refraction defined at each point. We've seen that the "least time" metric in a plane is

where we have set c=1, and n(x,y) is the index of refraction at the point (x,y). If we write this in polar coordinates r,, and if we assume that both n and d/dt depend only on r, this can be written as

for some function n(r). In order to match the Schwarzschild radial speed of light dr/dt we must have n(r) = r/(r2m), which completely determines the "refractive model" metric for light rays on the plane. The corresponding geodesic equations are

These are similar, but not identical, to the geodesic equations based on the Schwarzschild metric, as can be seen by comparing them with equations (2) in Section 6.2. The weak field deflection is almost indistinguishable. To see this, we proceed as we did with the Schwarzschild metric, integrating the second geodesic equation and determining the constant of integration from the perihelion condition at r = r0 to give

Substituting this into the metric divided by (dt)2 and solving for dr/dt gives

Dividing d/dt by dr/dt gives d/dr. Then, making the substitution  = r0/r as before we arrive at the integral for the angular travel from the perihelion to infinity

Doubling this gives the total angular travel between the incoming and outgoing asymptotes, and subtracting p from this travel gives the deflection . Expanding the integral in powers of m/r0, we have the result

Thus the first-order deflection for this simple refraction model is the same as for the Schwarzschild solution. The solutions differ in the second order, but this difference is much too small to be measured in the weak gravitational fields found in our solar system. However, the difference would be significant near a "black hole", because the radius for lightlike circular orbits in this refractive model is 4m, as opposed to 3m for the Schwarzschild metric.

On the other hand, it's important to keep in mind that the physical significance of the usual Schwarzschild coordinates can't be taken for granted when translated into a putative model based on simple refraction. The angular coordinates are fairly unambiguous, but we have various resonable choices for the radial parameter. One common choice gives the so-called isotropic coordinates. For the radial coordinate we use  , defined with respect to the Schwarzschild coordinate r by the relation

Note that the perimeter of a circular orbit of radius r is 2r, consistent with Euclidean geometry, whereas the perimeter of a circle of radius  is roughly 2(1 + m/). In terms of this radial parameter, the Schwarzschild metric takes the form

This leads to the positive-definite metric for light paths

Hence if we postulate a Euclidean space with the coordinates ,, centered on the mass m, and a refractive index varying with  according to the formula

then the equations of motion for light are formally identical to those predicted by general relativity. However, when we postulate a Euclidean space with the radial parameter r we are neglecting the fact that the perimeter of a circle of radius r in this space does not have the value 2, so this is not an entirely self-consistent interpretation, as opposed to the usual "curvature" interpretation of general relativity. In addition, physical refraction is ordinarily dependent on the frequency of the light, whereas gravitational deflection is not, so in order to achieve the formal match between the two we must make the physically implausible assumption of refractive index that is independent of frequency. Furthermore, it isn't self-evident that a refractive model can correctly account for the motions of time- like objects, whereas the curved-spacetime interpretation handles all these motions in a unified and self-consistent manner.

8.5 Scholium

I earnestly ask that all this be appraised honestly, and that defects in matters so very difficult be not so much reprehended as investigated and kindly supplemented by new endeavors of my readers. Isaac Newton, 1687

Considering that the first Scholium of Newton's Principia begins with the famous assertion "absolute, true, and mathematical time...flows equably, without relation to anything external", it's ironic that Newton's theory of universal gravitation can be interpreted as a theory of variations in the flow of time. Suppose in Newton's absolute space we establish the Cartesian coordinates x,y,z, and then assign a fourth coordinate, t, to every point. We will call this the coordinate time parameter, but we don't necessarily identify this with the "true time" of events. Instead we postulate that the true lapse of time along an incremental timelike path is d, given by

From the Galilean standpoint, we assume that a single set of assignments of the time coordinate t to events corresponds to the lapses of proper time d along any and all paths, which implies that g00 = 1 and k = 0. However, this can only be known to within some observational tolerance. Strictly speaking we can say only that g00 is extremely close to 1, and the constant k is very close to zero (in conventional units of measure).

Using indices with x0 = t, x1 = x, x2 = y, and x3 = z, we can re-write (1) as the summation

where

Now let's define a four-dimensional array of number representing the second partial a c derivatives of the gbd as a function of every pair of coordinates x , x

Also, we define the "contraction" of this array (using the summation convention for repeated indices) as

a 0 Since the only non-zero components of R bcd are R 0cd, it follows that the only non-zero component of Rab is

If we assume g00 is independent of the coordinate t (meaning that the metrical configuration is static), the first term vanishes and we find that R00 is just the Laplacian of g00. Hence if we take our vacuum field equations to be R = 0, this is equivalent to requiring that the Laplacian of g00 vanish, i.e.,

For convenience let us define the scalar  = g00/2. If we consider just spherically symmetrical fields about the origin, we have  = (r) and so

and similarly for the partials with respect to y and z. Since we have

and similarly for the y and z partials. Making these substitutions back into the Laplace equation gives

This is simple linear differential equation has the unique solution d/dr = J/r2 where J is a constant of integration, and so we have  = -J/r + K for some constants J and K.

Incidentally, it's worth noting that this applies only in three dimensions. If we were working in just two dimensions, the constant "2" in the above equation would be "1", and the unique solution would be d/dr = J/r, giving  = J ln(r) + K. This shows that Newtonian gravity "works" only with three space dimensions, just as general relativity works only with four spacetime dimensions.

Now that we've solved for the g00 field we need the equations of motion. We assume that objects in gravitational free-fall follow geodesics through the spacetime, so the equations of motion are just the geodesic equations

where x denote the quasi-Euclidean coordinates t,x,y,z defined above. Since we have assumed that the scale factor k between spatial and temporal coordinates is virtually zero, and that g00 is nearly equal to unity, it's clear that all the speed components dx/d, dy/d, dz/d are extremely small, whereas the derivative dt/d is virtually equal to 1. Neglecting all terms containing one or more of the speed components, we're left with the zeroth- order approximation for the spatial accelerations

From the definition of the Christoffel symbols we have

and similarly for the Christoffel symbols in the y and z equations. Since the metric components are independent of time, the partials with respect to t are all zero. Also, the  metric tensor g and its inverse g are both diagonal and the non-zero components of the latter are virtually equal to 1, 1/k, 1/k, 1/k. All the mixed components of g vanish, so we are left with

y z and similarly for  tt and  tt. As a result, the equations of motion in the weak slow limit are closely approximated by

We've seen that the Laplace equation requires gtt to be of the form 2K  2J/r for some constants K and J in a spherically symmetrical field, and since we expect dt/d to approach 1 as r increases, we can set 2K = 1. With gtt = 1  2J/r we have

and similarly for the partials with respect to y and z. Therefore the approximate equations of motion in the weak slow limit are

If we set J/k = -m, i.e., to the negative of the mass of the gravitating source, these are exactly the equations of motion for Newton's inverse-square attraction. Interestingly, this implies that precisely one of J,k is negative. If we choose to make J negative, then the gravitational "potential" has the form gtt = 1 + 2|J|/r, which signifies that the potential would increase as we approach the source, as would the rate of proper time along a stationary worldline with respect to coordinate time. In such a universe the value of k would need to be positive in order for gravity to be attractive, i.e., in order for geodesics to converge on the gravitating source. On the other hand, if we choose to make J positive, so that the potential and the rate of proper time decrease as we approach the source, then the constant k must be negative. Referring back to the original line element, this implies an indefinite metric. Naturally we can scale our units so that |k| = 1, but the sign of k is significant. Thus from the observation that "things fall down" we can nearly infer the Minkowski metrical structure of spacetime.

The fact that we can derive the correct trajectories of free-falling objects based on either of two diametrically opposed assumptions is not without precedent. This is very closely related to how Descartes and Newton were able to deduce the correct law of refraction based on the assumption that light travels more rapidly in denser media, while Fermat deduced the same law from the opposite assumption.

In any case, taking k = 1 and J = m, we see that Newton's law of gravitation in the vacuum is R = 0, closely paralleling the vacuum field equations of general relativity, which represents the vanishing of the Laplacian of g00/2. At a point with non-zero mass density we simply set this equal to 4 to give Poisson's equation. Hence is we define the energy-momentum array

we can express Newton's geometrical spacetime law of gravitation as

This can be compared with Einstein's field equations

Of course the "R" and "T" arrays in Newton's law are based on simple partial derivatives, rather than covariant differentiation, so they are not precisely identical to the Ricci tensor and the energy-momentum tensor of general relativity. However, the definitions are close enough that the tensors of general relativity can rightly be viewed as the natural generalizations of the simple Newtonian arrays. The above equations show that the acceleration of gravity is proportional to the rate of change of gtt as a function of r. At any given r we have d/dt = , so gtt corresponds to the squared "rate of proper time" (with respect to coordinate time) at the given r. It follows that our feet are younger than our heads, because time advances more slowly as we get closer to the center of the field. So, despite Newton's conception of the perfectly equable flow of time, his theory of gravitation can well be interpreted as a description of the effects of the inequable flow of time. In essence, the effect of Newtonian gravity can be explained in terms of the flow of time being slower near massive objects, and just as a refracted ray of light veers toward the medium in which light goes more slowly (and as a tank veers in the direction of the slower tread-track), objects progressing in time veer in the direction of slower proper time, causing them to accelerate toward massive objects.

8.6 On Gauss's Mountains

Grossmann is getting his doctorate on a topic that is connected with fiddling around and non-Euclidean geometry. I don’t know exactly what it is. Einstein to Mileva Maric, 1902

One of the most famous stories about Gauss depicts him measuring the angles of the great triangle formed by the mountain peaks of Hohenhagen, Inselberg, and Brocken for evidence that the geometry of space is non-Euclidean. It's certainly true that Gauss acquired geodetic survey data during his ten-year involvement in mapping the Kingdom of Hanover during the years from 1818 to 1832, and this data included some large "test triangles", notably the one connecting the those three mountain peaks, which could be used to check for accumulated errors in the smaller triangles. It's also true that Gauss understood how the intrinsic curvature of the Earth's surface would theoretically result in slight discrepancies when fitting the smaller triangles inside the larger triangles, although in practice this effect is negligible, because the Earth's curvature is so slight relative to even the largest triangles that can be visually measured on the surface. Still, Gauss computed the magnitude of this effect for the large test triangles because, as he wrote to Olbers, "the honor of science demands that one understand the nature of this inequality clearly". (The government officials who commissioned Gauss to perform the survey might have recalled Napoleon's remark that Laplace as head of the Department of the Interior had "brought the theory of the infinitely small to administration".) It is sometimes said that the "inequality" which Gauss had in mind was the possible curvature of space itself, but taken in context it seems he was referring to the curvature of the Earth's surface.

On the other hand, if the curvature of space was actually great enough to be observed in optical triangles of this size, then presumably Gauss would have noticed it, so we may still credit him with having performed an empirical observation of geometry, but in this same sense every person who ever lived has made such observations. It might be more meaningful to name people who have explicitly argued against the empirical status of geometry, i.e., who have claimed that the character of spatial relations could be known without empirical observation. In his "Critique of Pure Reason", Kant famously declared that Euclidean geometry is the only possible way in which the mind can organize information about extrinsic spatial relations. One could also cite Plato and other idealists and a priorists. On the other hand, Poincare advocated a conventionalist view of geometry, arguing that we can always, if we wish, cast our physics within a Euclidean spatial framework - provided we are prepared to make whatever adjustments in our physical laws are necessary to preserve this convention. In any case, it seems reasonable to agree with Buhler, who concludes in his biography of Gauss that "the oft-told story according to which Gauss wanted to decide the question [of whether space is perfectly Euclidean] by measuring a particularly large triangle is, as far as we know, a myth."

The first person to publicly propose an actual test of the geometry of space was apparently Lobachevski, who suggested that one might "investigate a stellar triangle for an experimental resolution of the question." The "stellar triangle" he proposed was the star Sirius and two different positions of the Earth at 6-month intervals. This was used by Lobachevski as an example to show how we could place limits on the deviation from flatness of actual space, based on the fact that, in a hyperbolic space of constant curvature, there is a limit to how small a star's parallax can be, even the most distant star. Gauss had already (in private correspondence with Taurinus in 1824) defined the "characteristic length" of a hyperbolic space, which he called "k", and had derived several formulas for the properties of such a space in terms of this parameter. For example, the circumference of a circle of radius r in a hyperbolic space whose "characteristic length" is k is given by

Since sinh(x) = x + x3/3! +..., it follows that C approaches 2r as k increases to infinity. From the fact that the maximum parallax of Sirius (as seen from the Earth at various times) is 1",24, Lobachevski deduced that the value of k for our space must be at least 166,000 times the radius of the Earth's orbit. Naturally the same analysis for more distant stars gives an even larger lower bound on k.

The first definite measurement of parallax for a fixed star was performed by Friedrich Bessel (a close friend of Gauss) in 1838, on the star 61 Cygni. Shortly thereafter he measured Sirius (and discovered its binary nature). Lobachevski's first paper on "the new geometry" was presented as a lecture at Kasan in 1826, followed by publications in 1829, 1835, 1840, and 1855 (a year before his death). He presented his lower bound on "k" in the later editions based on the still fairly recent experimental results of stellar parallax measurements. In 1855 Lobachevski was completely blind, so he dictated his exposition.

The other person credited with discovering non-Euclidean geometry, Janos Bolyai, was the son of Wolfgang Bolyai, who was a friend (almost the only friend) of Gauss during their school days at Gottingen in the late 1790's. The elder Bolyai had also been interested in the foundations of geometry, and spent many years trying to prove that Euclid's parallel postulate is a consequence of the other postulates. Eventually he concluded that it had been a waste of time, and he became worried when his son Janos became interested in the same subject. The alarmed father wrote to his son

For God's sake, I beseech you, give it up. Fear it no less than sensual passions because it, too, may take all your time, and deprive you of your health, peace of mind, and happiness in life.

Undeterred, Janos continued to devote himself to the study of the parallel postulate, and in 1829 he succeeded in proving just the opposite of what his father (and so many others) had tried in vain to prove. Janos found (as had Gauss, Taurinnus, and Lobachevesky just a few years earlier) that Euclid's parallel postulate is not a consequence of the other postulates, but is rather an independent assumption, and that alternative but equally consistent geometries based on different assumptions may be constructed. He called this the "Absolute Science of Space", and wrote to his father that "I have created a new universe from nothing". The father then, forgetting his earlier warnings, urged Janos to publish his findings as soon as possible, noting that

...ideas pass easily from one to another, and secondly... many things have an epoch, in which they are found at the same time in several places, just as violets appear on every side in spring.

Naturally the elder Bolyai sent a copy of his son's spectacular discovery to Gauss, in June of 1831, but it was apparently lost in the mail. Another copy was sent in January of 1832, and then seven weeks later Gauss sent a reply to his old friend:

If I commenced by saying that I am unable to praise this work, you would certainly be surprised for a moment. But I cannot say otherwise. To praise it would be to praise myself. Indeed the whole contents of the work, the path taken by your son, the results to which he is led, coincide almost entirely with my meditations, which have occupied my mind partly for the last thirty or thirty-five years. So I remained quite stupefied. So far as my own work is concerned, of which up till now I have put little on paper, my intention was not to let it be published during my lifetime. ... I have found very few people who could regard with any special interest what I communicated to them on this subject. ...it was my idea to write down all this later so that at least it should not perish with me. It is therefore a pleasant surprise for me that I am spared this trouble, and I am very glad that it is just the son of my old friend, who takes the precedence of me in such a remarkable manner.

In his later years Gauss' response to many communications of new mathematical results was similar to the above. For example, he once remarked that a paper of Abel's saved him the trouble of having to publish about a third of his results concerning elliptic integrals. Likewise he confided to friends that Jacobi and Eisenstein had "spared him the trouble" of publishing important results that he (Gauss) had possessed since he was a teenager, but had never bothered to publish. Dedekind even reports that Gauss made a similar comment about Riemann's dissertation. It's true that Gauss' personal letters and notebooks substantiate to some extent his private claims of priority for nearly every major mathematical advance of the 19th century, but the full extent of his early and unpublished accomplishments did not become known until after his death, and in any case it wouldn't have softened the blow to his contemporaries. Janos Bolyai was so embittered by Gauss's backhanded response to his non-Euclidean geometry that he never published again.

As another example of what Wolfgang Bolyai called "violets appearing on every side", Maxwell's great 1865 triumph of showing that electromagnetic waves propagate at the speed of light was, to some degree, anticipated by others. In 1848 Kirchoff had noted that the ratio of electromagnetic and electrostatic units was equal to the speed of light, although he gave no explanation for this coincidence. In 1858 Riemann presented a theory based on the hypothesis that electromagnetic effects propagate at a fixed speed, and then deduced that this speed must equal the ratio of electromagnetic and electrostatic units, i.e., .

Even in this field we find that Gauss can plausibly claim priority for some interesting developments. Recall that, in addition to being the foremost mathematician of his day, Gauss was also prominent in studying the phenomena of electricity and magnetism (in fact the unit of magnetism is called a Gauss), and even dabbled in electrodynamics. As mentioned in Section 3.5, he reached the conclusion that the keystone of electrodynamics would turn out to depend on an understanding of how electric effects propagate in time. In 1835 he wrote (in an unpublished papers, discovered after his death) that

Two elements of electricity in a state of relative motion attract or repel one another, but not in the same way as if they are in a state of relative rest.

He even suggested the following mathematical form for the complete electromagnetic force F between two particles with charges q1 and q2 in arbitrary states of motion

where r is the scalar distance, r is the vector distance, u is the relative velocity between the particles, and dots signify derivatives with respect to time. This formula actually gives the correct results for particles in uniform (inertial) motion, in which case the second derivative of the vector r is zero. However, the dot product in Gauss’s formula violates conservation of energy for general motions. A few years later (in 1845), Gauss’s friend Wilhelm Weber proposed a force law identical to Gauss’s, except he excluded the dot product, i.e., he proposed the formula

Weber pointed out that, unlike Gauss’s original formula, this force law satisfies conservation of energy, as shown by the fact that it can be derived from the potential function

In terms of this potential, the force given by F = d/dr is precisely Weber’s force law. Equation (1) was used by Weber as the basis of his theory of electrodynamics published in 1846. Indeed this formula served as the basis for most theoretical studies of electromagnetism until it was finally superseded by Maxwell's theory beginning in the 1870s. It’s interesting that in order for energy to be conserved it was necessary to eliminate the vectors from Gauss’s formula, making the result entirely in terms of the scalar distance and its derivatives. Compare this with the separation equations discussed in Sections 4.2 and 4.4. Note that according to (1) the condition for the force between two charged particles to vanish is that the quantity in parentheses equals zero, i.e.,

Differentiating both sides and dividing by r gives the condition , which is the same as equation (4) of Section 4.2 if we set N = 0. (The vanishing of the third derivative is also the condition for zero radiation reaction according to the Lorentz-Dirac equations of classical electrodynamics.) Interestingly, Kurt Schwarzschild published a paper in 1903 describing in detail how the Gauss-Weber approach could actually have been developed into a viable theory. In any case, if the two charged particles are separating (without rotation) at a uniform speed , Gauss' formula relates the electrostatic force 2 F0 = q1q1/r to the dynamic force as

So, to press the point, one could argue that Gauss' offhand suggestion for the formula expressing electrodynamic force already represents the seeds of Lorentz's molecular force hypothesis, from which follows the length contraction and time dilation of the Lorentz transformations and special relativity. In fact, pursuing this line of thought, Riemann (one of Gauss’ successors at Gottingen) proposed in 1858 that the electric potential should satisfy the equation

where  is the charge density. This equation does indeed give the retarded electrostatic potential, which, combined with the similar equation for the vector potential, serves as the basis for the whole classical theory of electromagnetism. Assuming conservation of charge, the invariance of the Minkowski spacetime metric clearly emerges from this equation, as does the invariance of the speed of light in terms of any suitable (i.e., inertially homogeneous and isotropic) system coordinates.

8.7 Strange Meeting

It seemed that out of battle I escaped Down some profound dull tunnel... Willfred Owen (1893-1918)

In the summer of 1913 Einstein accepted an offer of a professorship at the University of Berlin and membership in the Prussian Academy of Sciences. He left Zurich in the Spring of 1914, and his inagural address before the Prussian Academy took place on July 2, 1914. A month later, Germany was at war with Belgium, Russia, France, and Britian. Surprisingly, the world war did not prevent Einstein from continuing his intensive efforts to generalize the theory of relativity so as to make it consistent with gravitation - but his marriage almost did. By April of 1915 he was separated from his wife Mileva and their two young sons, who had once again taken up residence in Zurich. The marriage was not a happy one, and he later wrote to his friend Besso that if he had not kept her at a distance, he would have been worn out, physically and emotionally. Besso and Fritz Haber (Einstein's close friend and colleague) both made efforts to reconcile Albert and Mileva, but without success.

It was also during this period that Haber was working for the German government to develop poison gas for use in the war. On April 22, 1915 Haber directed the release of chlorine gas on the Western Front at Ypres in France. On May 23rd Italy declared war on Austria-Hungary, and subsequently against Germany itself. Meanwhile an Allied army was engaged in a disastrous campaign to take the Galipoli Peninsula from Germany's ally, the Turks. Germany shifted the weight of its armies to the Eastern Front during this period, hoping to knock Russia out of the war while fighting a holding action against the French and British in the West. In a series of huge battles from May to September the Austro-German armies drove the Russians back 300 miles, taking Poland and Lithuania and eliminating the threat to East Prussia. Despite these defeats, the Russians managed to re-form their lines and stay in the war (at least for another two years). The astronomer Kurt Schwarzschild was stationed with the German Army in the East, but still kept close watch on Einstein's progress, which was chronicled like a serialized Dickens novel in almost weekly publications of the Berlin Academy.

Toward the end of 1915, having failed to drive Russia out of the war, the main German armies were shifted back to the Western Front. Falkenhayn (the chief of the German general staff) was now convinced that a traditional offensive breakthrough was not feasible, and that Germany's only hope of ultimately ending the war on favorable terms was to engage the French in a war of attrition. His plan was to launch a methodical and sustained assault on a position that the French would feel honor-bound to defend to the last man. The ancient fortress of Verdun ("they shall not pass") was selected, and the plan was set in motion early in 1916. Falkenhayn had calculated that only one German soldier would be killed in the operation for every three French soldiers, so they would "bleed the French white" and break up the Anglo-French alliance. However, the actual casualty ratio turned out to be four Germans for every five French. By the end of 1916 a million men had been killed at Verdun, with no decisive change in the strategic position of either side, and the offensive was called off.

At about the same time that Falkenhayn was formulating his plans for Verdun, on Nov 25, 1915, Einstein arrived at the final form of the field equations for general relativity. After a long and arduous series of steps (and mis-steps), he was able to announce that "finally the general theory of relativity is closed as a logical structure". Given the subtlety and complexity of the equations, one might have expected that rigorous closed- form solutions for non-trivial conditions would be difficult, if not impossible, to find. Indeed, Einstein's computations of the bending of light, the precession of Mercury's orbit, and the gravitational redshift were all based on approximate solutions in the weak field limit. However, just two months later, Schwarzschild had the exact solution for the static isotropic field of a mass point, which Einstein presented on his behalf to the Prussian Academy on January 16, 1916. Sadly, Schwarzschild lived only another four months. He became ill at the front and died on May 11 at the age of 42.

It's been said that Einstein was scandalized by Schwarzschild's solution, for two reasons. First, he still imagined that the general theory might be the realization of Mach's dream of a purely relational theory of motion, and Einstein realized that the fixed spherically symmetrical spacetime of a single mass point in an otherwise empty universe is highly non-Machian. That such a situation could correspond to a rigorous solution of his field equations came as something of a shock, and probably contributed to his eventual rejection of Mach's ideas and positivism in general. Second, the solution found by Schwarzschild - which was soon shown by Birkhoff to be the unique spherically symmetric solution to the field equations (barring a non-zero cosmological constant) - contained what looked like an unphysical singularity. Of course, since the source term was assumed to be an infinitesimal mass point, a singularity at r = 0 is perhaps not too surprising (noting that Newton's inverse square law is also singular at r = 0). However, the Schwarzschild solution was also (apparently) singular at r = 2m, where m is the mass of the gravitating object in geometric units.

Einstein and others argued that it wasn't physically realistic for a configuration of particles of total mass M to reside within their joint Schwarzschild radius r = 2m, and so this "singularity" cannot exist in reality. However, subsequent analyses have shown that (barring some presently unknown phenomenon) there is nothing to prevent a sufficiently massive object from collapsing to within its Schwarzschild radius, so it's worthwhile to examine the formal singularity at r = 2m to understand its physical significance. We find that the spacetime manifold at this boundary need not be considered as singular, because it can be shown that the singularity is removable, in the sense that all the invariant measures of the field smoothly approach fixed finite values as r approaches 2m from either direction. Thus we can analytically continue the solution through the singularity.

Now, admittedly, describing the Schwarzschild boundary as an "analytically removable singularity" is somewhat unorthodox. It's customary to assert that the Schwarzschild solution is unequivocally non-singular at r = 2m, and that the intrinsic curvature and proper time of a free-falling object are finite and well-behaved at that radius. Indeed we derived these facts in Section 6.4. However, it's worth remembering that even with respect to the proper frame of an infalling test particle, we found that there remains a formal singularity at r = 2m. (See the discussion following equation 5 of Section 6.4.) The free-falling coordinate system does not remove the singularity, but it makes the singularity analytically removable. Similarly our derivation in Section 6.4 of the intrinsic curvature K of the Schwarzschild solution at r = 2m tacitly glossed over the intermediate result

Strictly speaking, the middle term on the right side is 0/0 (i.e., undefined) at r = 2m. Of course, we can divide the numerator and denominator by (r2m), but this step is unambiguously valid only if (r-2m) is not equal to zero. If (r-2m) does equal zero, this cancelation is still possible, but it amounts to the analytic removal of a singularity. In addition, once we have removed this singularity, the resulting term is infinite, formally equal to the third term, which is also infinite, but with opposite sign. We then proceed to subtract the infinite third term from the infinite second term to arrive at the innocuous- looking finite result K = -2m/r3 at r = 2m. Granted, the form of the metric coefficients and their derivatives depends on the choice of coordinates, and in a sense we can attribute the troublesome behavior of the metric components at r = 2m to the unsuitability of the traditional Schwarzschild coordinates r,t at this location. From this we might be tempted to conclude that the Schwarzschild radius has no physical significance. This is true locally, but globally the Schwarzschild radius is physically significant, as the event horizon between two regions of the manifold. Hence it isn't surprising that, in terms of the r,t coordinates, we encounter singularities and infinities, because these coordinates are globally unique, viz., the Schwarzschild coordinate t is the essentially unique time coordinate for which the manifold is globally static.

Interestingly, the solution in Schwarzschild's 1916 paper was not presented in terms of what we today call Schwarzschild coordinates. Those were introduced a year later by Droste. Schwarzschild presented a line element that is formally identical to the one for which he is know, viz,

In this formula the coordinates t, , and  have their usual meanings, and the parameter  is to be identified with 2m as usual. However, he did not regard "R" as the physically significant radial distance from the center of the field. He begins by declaring a set of rectangular space coordinates x,y,z, and then defines the radial parameter r such that

r2 = x2 + y2 + z2

Accordingly he relates these parameters to the angular coordinates , and  by the usual polar definitions

He wishes to make use of the truncated field equations

which (as discussed in Section 5.8) requires that the determinant of the metric be constant. Remember that this was written in 1915 (formally conveyed by Einstein to the Prussian academy on 13 January 1916), and apparently Schwarzschild was operating under the influence of Einstein's conception of the condition g=-1 as a physical principle, rather than just a convenience enabling the use of the truncated field equations. In any case, this is the form that Schwarzschild set out to solve, and he realized that the metric components of the most general spherically symmetrical static polar line element

where f and h are arbitrary functions of r has the determinant g = f(r) h(r) r4sin()2. (Schwarzschild actually included an arbitrary function of r on the angular terms of the line element, but that was superfluous.) To simplify the determinant condition he introduces the transformation

from which we get the differentials

Substituting these into the general line element gives the transformed line element

which has the determinant g = f(r)h(r). Schwarzschild then requires this to equal -1, so his derivation essentially assumes a priori that h(r) = 1/f(r). Interestingly, with this assumption it's easy to see that there is really only one function f(r) that can yield Kepler's laws of motion, as discussed in Section 5.5. Hence it could be argued that the field equations were superfluous to the determination of the spherically symmetrical static spacetime metric. On the other hand, the point of the exercise was to verify that this one physically viable metric is actually a solution of the field equations, thereby supporting their general applicability.

In any case, noting that r = (3x1)1/3 and sin()2 = 1  (x2)2, and with the stipulation that h(r) = 1/f(r), and that the metric go over to the Minkowski metric as r goes to infinity, Schwarzschild essentially showed that Einstein's field equations are satisfied by the above line element if f(r) = 1 /r where  is a constant of integration that "depends on the value of the mass at the origin". Naturally we take  = 2m for agreement with observation in the Newtonian limit. However, in the process of integrating the conditions on f(r) there appears another constant of integration, which Schwarzschild calls . So the general solution is actually

We ordinarily take  = 2m and  = 0 to give the usual result f(r) = 1 /r, but Schwarzschild was concerned to impose an additional constraint on the solution (beyond spherical symmetry, staticality, asymptotic flatness, and the field equations), which he expressed as "continuity of the [metric coefficients], except at r = 0". The metric coefficient h(r) = 1/f(r) is obviously discontinuous when f(r) vanishes, which is to say when r3 +  = 3. With the usual choice  = 0 this implies that the metric is discontinuous when r =  = 2m, which of course it is. This is the infamous Schwarzschild radius, where the usual Schwarzschild time coordinate becomes singular, representing the event horizon of a black hole. In retrospect, Schwarzschild's requirement for "continuity of the metric coefficients" is obviously questionable, since a discontinuity or singularity of a coordinate system is not generally indicative of a singularity in the manifold - the classical example being the singularity of polar coordinates at the North pole. Probably Schwarzschild meant to impose continuity on the manifold itself, rather than on the coordinates, but as Einstein remarked, "it is not so easy to free one's self from the idea that coordinates must have a direct metric significance". It's also somewhat questionable to impose continuity and absence of singularities except at the origin, because if this is a matter of principle, why should there be an exception, and why at the "origin" of the spherically symmetrical coordinate system?

Nevertheless, following along with Schwarzschild's thought, he obviously needs to require that the equality r3 +  = 3 be satisfied only when r = 0, which implies  = 3. Consequently he argues that the expression (r3 + )1/3 should not be reduced to r. Instead, he defines the parameter R = (r3 + )1/3, in terms of which the metric has the familiar form (1). Of course, if we put  = 0 then R = r and equation (1) reduces to the usual form of the Schwarzschild/Droste solution. However, with  = 3 we appear to have a physically distinct result, free of any coordinate singularity except at r = 0, which corresponds to the location R = . The question then arises as to whether this is actually a physically distinct solution from the usual one. From the definitions of the quasi-orthogonal coordinates x,y,z we see that x = y = z = 0 when r = 0, but of course the x,y,z coordinates also take on negative values at various points of the manifold, and nothing prevents us from extending the solution to negative values of the parameter r, at least not until we arrive at the condition R = 0, which corresponds to r = . At this location it can be shown that we have a genuine singularity in the manifold, because the curvature scalar becomes infinite.

In terms of these coordinates the entire surface of the Schwarzschild horizon has the same spatial coordinates x = y = z = 0, but nothing prevents us from passing through this point into negative values of r. It may seem that by passing into negative values of x,y,z we are simply increasing r again, but this overlooks the duality of solutions to

The distinction between the regions of positive and negative r is clearly shown in terms of polar coordinates, because the point in the equatorial plane with polar coordinates r,0 need not be identified with the point r,. Essentially polar coordinates cover two separate planes, one with positive r and the other with negative r, and the only smooth path between them is through the boundary point r = 0. According to Schwarzschild's original conception of the coordinates, this boundary point is the event horizon, whereas the physical singularity in the manifold occurs at the surface of a sphere whose radius is r = 2m. In other words, the singularity at the "center" of the Schwarzschild solution occurs just on the other side of the boundary point r = 0 of these polar coordinates. We can shift this boundary point arbitrarily by simply shifting the "zero point" of the complete r scale, which actually extends from - to +. However, none of this changes any of the proper intervals along any physical paths, because those are invariant under arbitrary (diffeomorphic) transformations. So Schwarzschild's version of the solution is not physically distinct from the usual interpretation introduced by Droste in 1917.

It's interesting that as late as 1936 (two decades after Schwarzschild's death) Einstein proposed to eliminate the coordinate singularity in the (by then) conventional interpretation of the Schwarzschild solution by defining a radial coordinate  in terms of the Droste coordinate r by the relation 2 = r  2m. In terms of this coordinate the line element is

Einstein notes that as  ranges from - to + the corresponding values of r range from + down to 2m and them back to +, so he conceives of the complete solution as two identical sheets of physical space connected by the "bridge" at the boundary  = 0, where r = 2m and the determinant of the metric vanishes. This is called the Einstein-Rosen bridge. For values of r less than 2m he argues that "there are no corresponding real values of ". On this basis he asserts that the region r < 2m has been excluded from the solution. However, this is really just another re-expression of the original Schwarzschild solution, describing the "exterior" portions of the solution, but neglecting the interior portion, where  is imaginary. However, just as we can allow Schwarzschild's r to take on negative values, we can allow Einstein's  to take on imaginary values. The maximal analytic extension of the Schwarzschild solution necessarily includes the interior region, and it can't be eliminated simply by a change of variables. Ironically, the reason the manifold seems to be well-behaved across Einstein's "bridge" between the two exterior regions while jumping over the interior region is precisely that the  coordinate is locally ill-behaved at  = 0. Birkhoff proved that the Schwarzschild solution is the unique spherically symmetrical solution of the field equations, and it has been shown that the maximal analytic extension of this solution (called the Kruskal extension) consists of two exterior regions connected by the internal region, and contains a genuine manifold singularity.

On the other hand, just because the maximally extended Schwarzschild solution satisfies the field equations, it doesn't necessarily follow that such a thing exists. In fact, there is no known physical process that would produce this configuration, since it requires two asymptotically flat regions of spacetime that happen to become connected at a singularity, and there is no reason to believe that such a thing would ever happen. In contrast, it's fairly plausible that some part of the complete Schwarzschild solution could be produced, such as by the collapse of a sufficiently massive star. The implausibility of the maximally extended solutions doesn't preclude the existence of black holes - although it does remind us to be cautious about assuming the actual existence of things just because they are solutions of the field equations.

Despite the implausibility of an Einstein-Rosen bridge connecting two distinct sheets of spacetime, this idea has recently gained widespread attention, the term "bridge" having been replaced with "wormhole". It's been speculated that under certain conditions it might be possible to actually traverse a wormhole, passing from one region of spacetime to another. As discussed above this is definitely not possible for the Schwarzschild solution, because of the unavoidable singularity, but people have recently explored the possibilities of traversable wormholes. Naturally if such direct conveyance between widely separate regions of spacetime were possible, and if those regions were also connected by (much longer) ordinary timelike paths, this raises the prospect of various kinds of "time travel", assuming a wormhole connected to the past was somehow established and maintained. However, these rather far-fetched scenarios all rely on the premise of negative energy density, which of course violates so-called "null energy condition", not to mention the weak, strong, and dominant energy conditions of classical relativity. In other words, on the basis of classical relativity and the traditional energy conditions we could rule out traversable wormholes altogether. It is only the fact that some quantum phenomena do apparently violate these energy conditions (albeit very slightly) that leaves open the remote possibility of such things.

8.8 Who Invented Relativity?

All beginnings are obscure. H. Weyl

There have been many theories of relativity throughout history, from the astronomical speculations of Heraclides to the geometry of Euclid to the classical theory of space, time, and dynamics developed by Galileo, Newton and others. Each of these was based on one or more principle of relativity. However, when we refer to the “theory of relativity” today, we usually mean one particular theory of relativity, namely, the body of ideas developed near the beginning of the 20th century and closely identified with the work of Albert Einstein. These ideas are distinguished from previous theories not by relativity itself, but by the way in which relativistically equivalent coordinate systems are related to each other.

One of the interesting historical aspects of the modern relativity theory is that, although often regarded as the highly original and even revolutionary contribution of a single individual, almost every idea and formula of the theory had been anticipated by others. For example, Lorentz covariance and the inertia of energy were both (arguably) implicit in Maxwell’s equations. Also, Voigt formally derived the Lorentz transformations in 1887 based on general considerations of the wave equation. In the context of electro- dynamics, Fitzgerald, Larmor, and Lorentz had all, by the 1890s, arrived at the Lorentz transformations, including all the peculiar "time dilation" and "length contraction" effects (with respect to the transformed coordinates) associated with Einstein's special relativity. By 1905, Poincare had clearly articulated the principle of relativity and many of its consequences, had pointed out the lack of empirical basis for absolute simultaneity, had challenged the ontological significance of the ether, and had even demonstrated that the Lorentz transformations constitute a group in the same sense as do Galilean transformations. In addition, the crucial formal synthesis of space and time into spacetime was arguably the contribution of Minkowski in 1907, and the dynamics of special relativity were first given in modern form by Lewis and Tolman in 1909. Likewise, the Riemann curvature and Ricci tensors for n-dimensional manifolds, the tensor formalism itself, and even the crucial Bianchi identities, were all known prior to Einstein’s development of general relativity in 1915. In view of this, is it correct to regard Einstein as the sole originator of modern relativity?

The question is complicated by the fact that relativity is traditionally split into two separate theories, the special and general theories, corresponding to the two phases of Einstein's historical development, and the interplay between the ideas of Einstein and those of his predecessors and contemporaries are different in the two cases. In addition, the title of Einstein’s 1905 paper (“On the Electrodynamics of Moving Bodies”) encouraged the idea that it was just an interpretation of Lorentz's theory of electrodynamics. Indeed, Wilhelm Wein proposed that the Nobel prize of 1912 be awarded jointly to Lorentz and Einstein, saying

The principle of relativity has eliminated the difficulties which existed in electrodynamics and has made it possible to predict for a moving system all electrodynamic phenomena which are known for a system at rest... From a purely logical point of view the relativity principle must be considered as one of the most significant accomplishments ever achieved in theoretical physics... While Lorentz must be considered as the first to have found the mathematical content of relativity, Einstein succeeded in reducing it to a simple principle. One should therefore assess the merits of both investigators as being comparable.

As it happens, the physics prize for 1912 was awarded to the Nils Gustaf Dalen (for the "invention of automatic regulators for lighting coastal beacons and light buoys during darkness or other periods of reduced visibility"), and neither Einstein, Lorentz, nor anyone else was ever awarded a Nobel prize for either the special or general theories of relativity. This is sometimes considered to have been an injustice to Einstein, although in retrospect it's conceivable that a joint prize for Lorentz and Einstein in 1912, as Wein proposed, assessing "the merits of both investigators as being comparable", might actually have diminished Einstein's subsequent popular image as the sole originator of both special and general relativity.

On the other hand, despite the somewhat misleading title of Einstein’s paper, the second part of the paper (“The Electrodynamic Part”) was really just an application of the general theoretical framework developed in the first part of the paper (“The Kinematic Part”). It was in the first part that special relativity was founded, with consequences extending far beyond Lorentz's electrodynamics. As Einstein later recalled,

The new feature was the realization that the bearing of the Lorentz transformation transcended its connection with Maxwell's equations and was concerned with the nature of space and time in general.

To give just one example, we may note that prior to the advent of special relativity the experimental results of Kaufmann and others involving the variation of an electron’s mass with velocity were thought to imply that all of the electron’s mass must be electromagnetic in origin, whereas Einstein’s kinematics revealed that all mass – regardless of its origin – would necessarily be affected by velocity in the same way. Thus an entire research program, based on the belief that the high-speed behavior of objects represented dynamical phenomena, was decisively undermined when Einstein showed that the phenomena in question could be interpreted much more naturally on a purely kinematic basis. Now, if this interpretation applied only to electrodynamics, it’s significance might be debatable, but already by 1905 it was clear that, as Einstein put it, “the Lorentz transformation transcended its connection with Maxwell’s equations”, and must apply to all physical phenomena in order to account for the complete inability to detect absolute motion. Once this is recognized, it is clear that we are dealing not just with properties of electricity and magnetism, or any other specific entities, but with the nature of space and time themselves. This is the aspect of Einstein's 1905 theory that prompted Witkowski, after reading vol. 17 of Annalen der Physic, to exclaim: "A new Copernicus is born! Read Einstein's paper!" The comparison is apt, because the contribution of Copernicus was, after all, essentially nothing but an interpretation of Ptolemy’s astronomy, just as Einstein's theory was an interpretation of Lorentz's electrodynamics. Only subsequently did men like Kepler, Galileo, and Newton, taking the Copernican insight even more seriously than Copernicus himself had done, develop a substantially new physical theory. It's clear that Copernicus was only one of several people who jointly created the "Copernican revolution" in science, and we can argue similarly that Einstein was only one of several individuals (including Maxwell, Lorentz, Poincare, Planck, and Minkowski) responsible for the "relativity revolution".

The historical parallel between special relativity and the Copernican model of the solar system is not merely superficial, because in both cases the starting point was a pre- existing theoretical structure based on the naive use of a particular system of coordinates lacking any inherent physical justification. On the basis of these traditional but eccentric coordinate systems it was natural to imagine certain consequences, such as that both the Sun and the planet Venus revolve around a stationary Earth in separate orbits. However, with the newly-invented telescope, Galileo was able to observe the phases of Venus, clearly showing that Venus moves in (roughly) a circle around the Sun. In this way the intrinsic patterns of the celestial bodies became better understood, but it was still possible (and still is possible) to regard the Earth as stationary in an absolute extrinsic sense. In fact, for many purposes we continue to do just that, but from an astronomical standpoint we now almost invariably regard the Sun as the "center" of the solar system. Why? The Sun too is moving among the stars in the galaxy, and the galaxy itself is moving relative to other galaxies, so on what basis do we decide to regard the Sun as the "center" of the solar system?

The answer is that the Sun is the inertial center. In other words, the Copernican revolution (as carried to its conclusion by the successors of Copernicus) can be summarized as the adoption of inertia as the prime organizing principle for the understanding and description of nature. The concept of physical inertia was clearly identified, and the realization of its significance evolved and matured through the works of Kepler, Galileo, Newton, and others. Nature is most easily and most perspicuously described in terms of inertial coordinates. Of course, it remains possible to adopt some non-inertial system of coordinates with respect to which the Earth can be regarded as the stationary center, but there is no longer any imperative to do this, especially since we cannot thereby change the fact that Venus circles the Sun, i.e., we cannot change the intrinsic relations between objects, and those intrinsic relations are most readily expressed in terms of inertial coordinates.

Likewise the pre-existing theoretical structure in 1905 described events in terms of coordinate systems that were not clearly understood and were lacking in physical justification. It was natural within this framework to imagine certain consequences, such as anisotropy in the speed of light, i.e., directional dependence of light speed resulting from the Earth's motion through the (assumed stationary) ether. This was largely motivated by the idea that light consists of a wave in the ether, and therefore is not an inertial phenomenon. However, experimental physicists in the late 1800's began to discover facts analogous to the phases of Venus, e.g., the symmetry of electromagnetic induction, the "partial convection" of light in moving media, the isotropy of light speed with respect to relatively moving frames of reference, and so on. Einstein accounted for all these results by showing that they were perfectly natural if things are described in terms of inertial coordinates - provided we apply a more profound understanding of the definition and physical significance of such coordinate systems and the relationships between them.

As a result of the first inertial revolution (initiated by Copernicus), physicists had long been aware of the existence of a preferred class of coordinate systems - the inertial systems - with respect to which inertial phenomena are isotropic. These systems are equivalent up to orientation and uniform motion in a straight line, and it had always been tacitly assumed that the transformation from one system in this class to another was given by a Galilean transformation. The fundamental observations in conflict with this assumption were those involving electric and magnetic fields that collectively implied Maxwell's equations of electromagnetism. These equations are not invariant under Galilean transformations, but they are invariant under Lorentz transformations. The discovery of Lorentz invariance was similar to the discovery of the phases of Venus, in the sense that it irrevocably altered our awareness of the intrinsic relations between events. We can still go on using coordinate systems related by Galilean transformations, but we now realize that only one of those systems (at most) is a truly inertial system of coordinates.

Incidentally, the electrodynamic theory of Lorentz was in some sense analogous to Tycho Brahe's model of the solar system, in which the planets revolve around the Sun but the Sun revolves around a stationary Earth. Tycho's model was kinematically equivalent to Copernicus' Sun-centered model, but expressed – awkwardly – in terms of a coordinate system with respect to which the Earth is stationary, i.e., a non-inertial coordinate system.

It's worth noting that we define inertial coordinates just as Galileo did, i.e., systems of coordinates with respect to which inertial phenomena are isotropic, so our definition hasn't changed. All that has changed is our understanding of the relations between inertial coordinate systems. Einstein's famous "synchronization procedure" (which was actually first proposed by Poincare) was expressed in terms of light rays, but the physical significance of this procedure is due to the empirical fact that it yields exactly the same synchronization as does Galileo's synchronization procedure based on mechanical inertia. To establish simultaneity between spatially separate events while floating freely in empty space, throw two identical objects in opposite directions with equal force, so that the thrower remains stationary in his original frame of reference. These objects then pass equal distances in equal times, i.e., they serve to assign inertially simultaneous times to separate events as they move away from each other. In this way we can theoretically establish complete slices of inertial simultaneity in spacetime, based solely on the inertial behavior of material objects. Someone moving uniformly relative to us can carry out this same procedure with respect to his own inertial frame of reference and establish his own slices of inertial simultaneity throughout spacetime. The unavoidable intrinsic relations that were discovered at the end of the 19th century show that these two sets of simultaneity slices are not identical. The two main approaches to the interpretation of these facts were discussed in Sections 1.5 and 1.6. The approach advocated by Einstein was to adhere to the principle of inertia as the basis for organizing our understanding and descriptions of physical phenomena - which was certainly not a novel idea.

In his later years Einstein observed "there is no doubt that the Special Theory of Relativity, if we regard its development in retrospect, was ripe for discovery in 1905". The person (along with Lorentz) who most nearly anticipated Einstein's special relativity was undoubtedly Poincare, who had already in 1900 proposed an explicitly operational definition of clock synchronization and in 1904 suggested that the ether was in principle undetectable to all orders of v/c. Those two propositions and their consequences essentially embody the whole of special relativity. Nevertheless, as late as 1909 Poincare was not prepared to say that the equivalence of all inertial frames combined with the invariance of (two-way) light speed were sufficient to infer Einstein's model. He maintained that one must also stipulate a particular contraction of physical objects in their direction of motion. This is sometimes cited as evidence that Poincare still failed to understand the situation, but there's a sense in which he was actually correct. The two famous principles of Einstein's 1905 paper are not sufficient to uniquely identify special relativity, as Einstein himself later acknowledged. One must also stipulate, at the very least, homogeneity, memorylessness, and isotropy. Of these, the first two are rather innocuous, and one could be forgiven for failing to explicitly mention them, but not so the assumption of isotropy, which serves precisely to single out Einstein's simultaneity convention from all the other - equally viable - interpretations. (See Section 4.5). This is also precisely the aspect that is fixed by Poincare's postulate of contraction as a function of velocity.

In a sense, the failure of Poincare to found the modern theory of relativity was not due to a lack of discernment on his part (he clearly recognized the Lorentz group of space and time transformations), but rather to an excess of discernment and philosophical sophistication, preventing him from subscribing to the young patent examiner's inspired but perhaps slightly naive enthusiasm for the symmetrical interpretation, which is, after all, only one of infinitely many possibilities. Poincare recognized too well the extent to which our physical models are both conventional and provisional. In retrospect, Poincare's scruples have the appearance of someone arguing that we could just as well regard the Earth rather than the Sun as the center of the solar system, i.e., his reservations were (and are) technically valid, but in some sense misguided. Also, as Max Born remarked, to the end of Poincare’s life his expositions of relativity “definitely give you the impression that he is recording Lorentz’s work”, and yet “Lorentz never claimed to be the author of the principle of relativity”, but invariably attributed it to Einstein. Indeed Lorentz himself often expressed reservations about the relativistic interpretation.

Regarding Born’s impression that Poincare was just “recording Lorentz’s work”, it should be noted that Poincare habitually wrote in a self-effacing manner. He named many of his discoveries after other people, and expounded many important and original ideas in writings that were ostensibly just reviewing the works of others, with “minor amplifications and corrections”. So, we shouldn’t be misled by Born’s impression. Poincare always gave the impression that he was just recording someone else’s work – in contrast with Einstein, whose style of writing, as Born said, “gives you the impression of quite a new venture”. Of course, Born went on to say, when recalling his first reading of Einstein’s paper in 1907, “Although I was quite familiar with the relativistic idea and the Lorentz transformations, Einstein’s reasoning was a revelation to me… which had a stronger influence on my thinking than any other scientific experience”.

Lorentz’s reluctance to fully embrace the relativity principle (that he himself did so much to uncover) is partly explained by his belief that "Einstein simply postulates what we have deduced... from the equations of the electromagnetic field". If this were true, it would be a valid reason for preferring Lorentz's approach. However, if we closely examine Lorentz's electron theory we find that full agreement with experiment required not only the invocation of Fitzgerald's contraction hypothesis, but also the assumption that mechanical inertia is Lorentz covariant. It's true that, after Poincare complained about the proliferation of hypotheses, Lorentz realized that the contraction could be deduced from more fundamental principles (as discussed in Section 1.5), but this was based on yet another hypothesis, the co-called molecular force hypothesis, which simply asserts that all physical forces and configurations (including the unknown forces that maintain the shape of the electron) transform according to the same laws as do electromagnetic forces. Needless to say, it obviously cannot follow deductively "from the equations of the electromagnetic field" that the necessarily non-electromagnetic forces which hold the electron together must transform according to the same laws. (Both Poincare and Einstein had already realized by 1905 that the mass of the electron cannot be entirely electromagnetic in origin.) Even less can the Lorentz covariance of mechanical inertia be deduced from electromagnetic theory. We still do not know to this day the origin of inertia, so there is no sense in which Lorentz or anyone else can claim to have deduced Lorentz covariance in any constructive sense, let alone from the laws of electromagnetism.

Hence Lorentz's molecular force hypothesis and his hypothesis of covariant mechanical inertia together are simply a disguised and piece-meal way of postulating universal Lorentz invariance - which is precisely what Lorentz claims to have deduced rather than postulated. The whole task was to reconcile the Lorentzian covariance of electromagnetism with the Galilean covariance of mechanical dynamics, and Lorentz simply recognized that one way of doing this is to assume that mechanical dynamics (i.e., inertia) is actually Lorentz covariant. This is presented as an explicit postulate (not a deduction) in the final edition of his book on the Electron Theory. In essence, Lorentz’s program consisted of performing a great deal of deductive labor, at the end of which it was still necessary, in order to arrive at results that agreed with experiment, to simply postulate the same principle that forms the basis of special relativity. (To his credit, Lorentz candidly acknowledged that his deductions were "not altogether satisfactory", but this is actually an understatement, because in the end he simply postulated what he claimed to have deduced.)

In contrast, Einstein recognized the necessity of invoking the principle of relativity and Lorentz invariance at the start, and then demonstrated that all the other "constructive" labor involved in Lorentz's approach was superfluous, because once we have adopted these premises, all the experimental results arise naturally from the simple kinematics of the situation, with no need for molecular force hypotheses or any other exotic and dubious conjectures regarding the ultimate constituency of matter. On some level Lorentz grasped the superiority of the purely relativistic approach, as is evident from the words he included in the second edition of his "Theory of Electrons" in 1916:

If I had to write the last chapter now, I should certainly have given a more prominent place to Einstein's theory of relativity by which the theory of electromagnetic phenomena in moving systems gains a simplicity that I had not been able to attain. The chief cause of my failure was my clinging to the idea that the variable t only can be considered as the true time, and that my local time t' must be regarded as no more than an auxiliary mathematical quantity.

Still, it's clear that neither Lorentz nor Poincare ever whole-heartedly embraced special relativity, for reasons that may best be summed up by Lorentz when he wrote

Yet, I think, something may also be claimed in favor of the form in which I have presented the theory. I cannot but regard the aether, which can be the seat of an electromagnetic field with its energy and its vibrations, as endowed with a certain degree of substantiality, however different it may be from all ordinary matter. In this line of thought it seems natural not to assume at starting that it can never make any difference whether a body moves through the aether or not, and to measure distances and lengths of time by means of rods and clocks having a fixed position relatively to the aether.

This passage implies that Lorentz's rationale for retaining a substantial aether and attempting to refer all measurements to the rest frame of this aether (without, of course, specifying how that is to be done) was the belief that it might, after all, make some difference whether a body moves through the aether or not. In other words, we should continue to look for physical effects that violate Lorentz invariance (by which we now mean local Lorentz invariance), both in new physical forces and at higher orders of v/c for the known forces. A century later, our present knowledge of the weak and strong nuclear forces and the precise behavior of particles at 0.99999c has vindicated Einstein's judgment that Lorentz invariance is a fundamental principle whose significance and applicability extends far beyond Maxwell's equations, and apparently expresses a general attribute of space and time, rather than a specific attribute of particular physical entities.

In addition to the formulas expressing the Lorentz transformations, we can also find precedents for other results commonly associated with special relativity, such as the equivalence of mass and energy. In fact, the general idea of associating mass with energy in some way had been around for about 25 years prior to Einstein's 1905 papers. Indeed, as Thomson and even Einstein himself noted, this association is already implicit in Maxwell's theory. With electric and magnetic fields e and b, the energy density is (e2 + b2)/(8) and the momentum density is (e x b)/(4c), so in the case of radiation (when e and b are equal and orthogonal) the energy density is E = e2/(4) and the momentum density is p = e2/(4c). Taking momentum p as the product of the radiation's "mass" m times its velocity c, we have

and so E = mc2. Indeed, in the 1905 paper containing his original deduction of mass- energy equivalence, Einstein acknowledges that it was explicitly based on "Maxwell's expression for the electromagnetic energy of space". We can also mention the pre-1905 work of Poincare and others on the electron mass arising from it's energy, and the work of Hasenohrl on how the mass of a cavity increases when it is filled with radiation. However, these suggestions were all very restricted in their applicability, and didn't amount to the assertion of a fundamental equivalence such as emerges so clearly from Einstein's relativistic interpretation. Hardly any of the formulas in Einstein's two 1905 papers on relativity were new, but what Einstein provided was a single conceptual framework within which all those formulas flow quite naturally from a simple set of general principles.

Occasionally one hears of other individuals who are said to have discovered one or more aspect of relativity prior to Einstein. To take just one example, in November of 1999 there appeared in newspapers around the world a story claiming that "The mathematical equation that ushered in the atomic age was discovered by an unknown Italian dilettante two years before Albert Einstein used it in developing the theory of relativity...". The "dilettante" in question was an Italian business man named Olinto De Pretto, and the implication of the story was that Einstein got the idea for mass-energy equivalence from "De Pretto's insight". There are some obvious difficulties with this account, only some of which can be blamed on the imprecision of popular journalism. First, the story claimed that Einstein used the idea of mass-energy equivalence to develop special relativity, whereas in fact the idea that energy has inertia appeared in a very brief note that Einstein submitted for publication toward the end of 1905, after (and as a consequence of) the original paper on special relativity.

The newspaper report went on to say that "De Pretto had stumbled on the equation, but not the theory of relativity... It was republished in 1904 by Veneto's Royal Science Institute... A Swiss Italian named Michele Besso alerted Einstein to the research and in 1905 Einstein published his own work..." Now, it's certainly true that Besso was Italian, and worked with Einstein at the Bern Patent Office during the years leading up to 1905, and it's true that they discussed physics, and Besso provided Einstein with suggestions for reading (for example, it was Besso who introduced him to the works of Ernst Mach). However, there is no evidence that Besso ever “alerted Einstein” to De Pretto’s paper. Moreover, the idea that Einstein’s second relativity paper in 1905 (let alone the first) was in any way prompted or inspired by De Pretto's rather silly and unoriginal comments is bizarre.

In essence, De Pretto's "insight" was the (hardly novel) idea that matter consists of tiny particles, and that these particles are agitated by their exposure to an ultra-mundane flux of hypothetical ether particles in a "shadow theory" of gravity. Supposing that these ether particles move at the speed of light (or perhaps at the “speed of electricity”, which he believed was significantly higher), De Pretto reasoned – in a qualitative way – that the mean vibrational speed of the particles of matter must approach the speed of the ether particles, i.e., the speed of light. He then asserted (erroneously) that the kinetic energy of a mass m moving at speed v is mv2, which is actually Leibniz's "vis viva", the living force. On this basis, De Pretto asserted that the kinetic energy in a quantity of mass m would be mc2, which, he did not fail to notice, is a lot of energy. However, this line of reasoning was not original to De Pretto. The shadow theory of gravity was first conceived by Newton’s friend Nicholas Fatio in the 1690’s, and subsequently re-discovered by many individuals, notably George Louis Lesage in the late 18th century. Furthermore, the realization that the bombardment of such an intense ultramundane flux would necessarily elevate the temperature of ordinary matter to incredible temperatures in a fraction of a second was noted by both Kelvin and Maxwell in the late 19th century. Poincare and Lorentz both realized the same thing, and used this fact to conclude that the shadow model of gravity is not viable (since it entails the vaporization of the Earth in a fraction of a second). Hence, far from contributing any new “insight”, De Pretto’s only contribution was a lack of insight, in blithely ignoring the preposterous thermodynamic implications of this (very old) idea. Needless to say, none of this bears any resemblance to the concept of mass-energy equivalence that emerges from special relativity.

Of course, this is not to say that Einstein had no predecessors in working toward the genuine concept of mass-energy equivalence. Some form of this idea was already to be found in the writings of Thomson, Lorentz, Poincare, etc. (not to mention Isaac Newton, who famously asked "Are not gross bodies and light convertible into one another...?"). After all, the idea that the electron's mass was electromagnetic in origin was one of the leading hypotheses of research at that time. It would be like saying that some theoretical physicist today had never heard of string theory! But it’s clear that mass-energy equivalence did not inspire Einstein’s development of special relativity, because it isn’t mentioned in the foundational paper of 1905. Only a few months later did he recognize this implication of the theory, prompting him to write in a letter to his close friend Conrad Habicht as he was preparing the paper on mass-energy equivalence:

One more consequence of the paper on electrodynamics has also occurred to me. The principle of relativity, in conjunction with Maxwell's equations, requires that mass be a direct measure of the energy contained in a body; light carries mass with it. A noticeable decrease of mass should occur in the case of radium [as it emits radiation]. The argument [which he intends to present in the paper] is amusing and seductive, but for all I know the Lord might be laughing over it and leading me around by the nose.

These are clearly the words of someone who is genuinely working out the consequences of his own recent paper, and wondering about their validity, not someone who has gotten an idea from seeing a formula in someone else's paper. Of course, the most obvious proof of the independence of Einstein’s path to special relativity is simply the wonderfully lucid sequence of thoughts presented in his 1905 paper, beginning from first principles and a careful examination of the physical significance of time and space, and leading to the kinematics of special relativity, from which the inertia of energy emerges naturally.

Nevertheless, we shouldn't underestimate the real contributions to the development of special relativity made by Einstein's predecessors, most notably Lorentz and Poincare. In addition, although Einstein was remarkably thorough in his 1905 paper, there were nevertheless important contributions to the foundations of special relativity made by others in the years that followed. For example, in 1907 Max Planck greatly clarified relativistic mechanics, basing it on the conservation of momentum with his "more advantageous" definition of force, as did Tolman and Lewis. Planck also critiqued Einstein's original deduction of mass-energy equivalence, and gave a more general and comprehensive argument. (This led Johannes Stark in 1907 to cite Planck as the originator of mass-energy equivalence, prompting an angry letter from Einstein saying that he "was rather disturbed that you do not acknowledge my priority with regard to the connection between mass and energy". In later years Stark became an outspoken critic of Einstein's work.)

Another crucially important contribution was made by Hermann Minkowski (one of Einstein's former professors), who recognized that what Einstein had described was simply ordinary kinematics in a four-dimensional spacetime manifold with the pseudo- metric

Poincare had also recognized this as early as 1905. This was vital for the generalization of relativity which Einstein – with the help of his old friend Marcel Grossmann – developed on the basis on the theory of curved manifolds developed in the 19th century by Gauss and Riemann.

The tensor calculus and generally covariant formalism employed by Einstein in his general theory had been developed by Gregorio Ricci-Curbastro and Tullio Levi-Civita around 1900 at the University of Padua, building on the earlier work of Gauss, Riemann, Beltrami, and Christoffel. In fact, the main technical challenge that occupied Einstein in his efforts to find a suitable field law for gravity, which was to construct from the metric tensor another tensor whose covariant derivative automatically vanishes, had already been solved in the form of the Bianchi identities, which lead directly to the Einstein tensor as discussed in Section 5.8.

Several other individuals are often cited as having anticipated some aspect of general relativity, although not in any sense of contributing seriously to the formulation of the theory. John Mitchell wrote in 1783 about the possibility of "dark stars" that we so massive light could not escape from them, and Laplace contemplated the same possibility in 1796. Around 1801 Johann von Soldner predicted that light rays passing near the Sun would be deflected by the Sun’s gravity, just like a small corpuscle of matter moving at the speed of light. (Ironically, although Newton’s theory implies a deflection of just half the relativistic value, Soldner erroneously omitted a factor of 1/2 from his calculation, so he arrived at the relativistic value, albeit by a computational error.) William Clifford wrote about a possible connection between matter and curved space in 1873.

Interestingly, the work of Soldner had been virtually forgotten until being rediscovered and publicized by Philipp Lenard in 1921, along with the claim that Hasenohrl should be credited with the mass-energy equivalence relation. Similarly in 1917 Ernst Gehrcke arranged for the re-publication of a 1898 paper by a secondary school teacher named Paul Gerber which contained a formula for the precession of elliptical orbits identical to the one Einstein had derived from the field equations of general relativity. Gerber's approach was based on the premise that the gravitational potential propagates at the speed of light, and that the effect of the potential on the motion of a body depends on the body's velocity through the potential field. His potential was similar in form to the Gauss-Weber theories. However, Gerber's "theory" was (and still is) regarded as unsatisfactory, mainly because his conclusions don’t follow from his premises, but also because the combination of Gerber's proposed gravitational potential with the rest of (nonrelativistic) physics results in predictions (such as 3/2 the relativistic prediction for the deflection of light rays near the Sun) which are inconsistent with observation. In addition, Gerber's free mixing of propagating effects with some elements of action-at-a-distance tended to undermine the theoretical coherence of his proposal.

The writings of Mitchell, Soldner, Gerber, and others were, at most, anticipations of some of the phenomenology later associated with general relativity, but had nothing to do with the actual theory of general relativity, i.e., a theory that conceives of gravity as a manifestation of the curvature of spacetime. A closer precursors can be found in the notional writings of William Kingdon Clifford, but like Gauss and Riemann he lacked the crucial idea of including time as one of the dimensions of the manifold. As noted above, the formal means of treating space and time as a single unified spacetime manifold was conceived by Poincare and Minkowski, and the tensor calculus was developed by Ricci and Levi-Civita, with whom Einstein corresponded during the development of general relativity. It’s also worth mentioning that Einstein and Grossmann, working in collaboration, came very close to discovering the correct field equations in 1913, but were diverted by an erroneous argument that led them to believe no fully covariant equations could be consistent with experience. In retrospect, this accident may have been all that prevented Grossmann from being perceived as a co-creator of general relativity. On the other hand, Grossmann had specifically distanced himself from the physical aspects of the 1913 paper, and Einstein wrote to Sommerfeld in July 1915 (i.e., prior to arriving at the final form of the field equations) that

Grossman will never lay claim to being co-discoverer. He only helped in guiding me through the mathematical literature but contributed nothing of substance to the results.

In the summer of 1915 Einstein gave a series of lectures at Gottingen on the general theory, and apparently succeeded in convincing both Hilbert and Klein that he was close to an important discovery, despite the fact that he had not yet arrived at the final form of the field equations. Hilbert took up the problem from an axiomatic standpoint, and carried on an extensive correspondence with Einstein until the 19th of November. On the 20th, Hilbert submitted a paper to the Gesellschaft der Wissenschaften in Gottingen with a derivation of the field equations. Five days later, on 25 November, Einstein submitted a paper with the correct form of the field equations to the Prussian Academy in Berlin. The exact sequence of events leading up to the submittal of these two papers – and how much Hilbert and Einstein learned from each other – is somewhat murky, especially since Hilbert’s paper was not actually published until March of 1916, and seems to have undergone some revisions from what was originally submitted. However, the question of who first wrote down the fully covariant field equations (including the trace term) is less significant than one might think, because, as Einstein wrote to Hilbert on 18 November after seeing a draft of Hilbert’s paper

The difficulty was not in finding generally covariant equations for the g’s; for this is easily achieved with the aid of Riemann’s tensor. Rather, it was hard to recognize that these equations are a generalization – that is, a simple and natural generalization – of Newton’s law.

It might be argued that Einstein was underestimating the mathematical difficulty, since he hadn’t yet included the trace term in his published papers, but in fact he repeated the same comment in a letter to Sommerfeld on 28 November, this time explicitly referring to the full field equations, with the trace term. He wrote

It is naturally easy to set these generally covariant equations down; however, it is difficult to recognize that they are generalizations of Poisson’s equations, and not easy to recognize that they fulfill the conservation laws. I had considered these equations with Grossmann already 3 years ago, with the exception of the [trace term], but at that time we had come to the conclusion that it did not fulfill Newton’s approximation, which was erroneous.

Thus he regards the purely mathematical task of determining the most general fully covariant expression involving the g’s and their first and second derivatives as comparatively trivial and straightforward – as indeed it is for a competent mathematician. The Bianchi identities were already known, so there was no new mathematics involved. The difficulty, as Einstein stressed, was not in writing down the solution of this mathematical problem, but in conceiving of the problem in the first place, and then showing that it represents a viable law of gravitation. In this, Einstein was undeniably the originator, not only in showing that the field equations reduce to Newton’s law in the first approximation, but also in showing that they yield Mercury’s excess precession in the second approximation. Hilbert was suitably impressed when Einstein showed this in his paper of 18 November, and it’s important to note that this was how Einstein was spending his time around the 18th of November, establishing the physical implications of the fully covariant field equations, while Hilbert was busying himself with elaborating the mathematical aspects of the problem that Einstein had outlined the previous summer.

It’s also worth noting that although they arrived at the same formulas, Hilbert and Einstein were working in fundamentally different contexts, so it would be somewhat misleading to say that they arrived at the same theoretical result. Already in 1921 Pauli commented on both the simultaneous discoveries and on the distinctions between what the two men discovered.

At the same time as Einstein, and independently, Hilbert formulated the generally covariant field equations. His presentation, though, would not seem to be acceptable to physicists, for two reasons. First, the existence of a variational principle is introduced as an axiom. Secondly, of more importance, the field equations are not derived for an arbitrary system of matter, but are specifically based on Mie’s theory of matter.

Whatever the true sequence of events and interactions, it seems that Einstein initially had some feelings of resentment toward Hilbert, perhaps thinking that Hilbert had acted ungraciously and stolen some of his glory. Already on November 20 Einstein had written to a friend

The theory is incomparably beautiful, but only one colleague understands it, and that one works skillfully at "nostrification". I have learned the deplorableness of humans more in connection with this theory than in any other personal experience. But it doesn't bother me.

(Literally the word “nostrification” refers to the process by which a country accepts foreign academic degrees as if they had been granted by one of its own universities, but the word has often been used to suggest the appropriation and re-packaging of someone else’s ideas and making them one’s own.) However, by December 20 he was able to write a conciliatory note to Hilbert, saying

There has been between us a certain unpleasantness, whose cause I do not wish to analyze. I have struggled against feelings of bitterness with complete success. I think of you again with untroubled friendliness, and ask you to do the same with me. It would be a shame if two fellows like us, who have worked themselves out from this shabby world somewhat, cannot enjoy each other.

Thereafter they remained on friendly terms, and Hilbert never publicly claimed any priority in the discovery of general relativity, and always referred to it as Einstein’s theory.

As it turned out, Einstein can hardly have been dissatisfied with the amount of popular credit he received for the theories of relativity, both special and general. Nevertheless, one senses a bit of annoyance when Max Born mentioned to Einstein in 1953 (two years before Einstein's death) that the second volume of Edmund Whittaker's book “A History of the Theories of Aether and Electricity” had just appeared, in which special relativity is attributed to Lorentz and Poincare, with barely a mention of Einstein except to say that "in the autumn of [1905] Einstein published a paper which set forth the relativity theory of Poincare and Lorentz with some amplifications, and which attracted much attention". In the same book Whittaker attributes some of the fundamental insights of general relativity to Planck and a mathematician named Harry Bateman (a former student of Whittaker’s). Einstein replied to his old friend Born

Everybody does what he considers right... If he manages to convince others, that is their own affair. I myself have certainly found satisfaction in my efforts, but I would not consider it sensible to defend the results of my work as being my own 'property', as some old miser might defend the few coppers he had laboriously scrapped together. I do not hold anything against him [Whittaker], nor of course, against you. After all, I do not need to read the thing.

On the other hand, in the same year (1953), Einstein wrote to the organizers of a celebration honoring the upcoming fiftieth anniversary of his paper on the electrodynamics of moving bodies, saying

I hope that one will also take care on that occasion to suitably honor the merits of Lorentz and Poincare.

8.9 Paths Not Taken

Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood And looked down one as far as I could To where it bent in the undergrowth… Robert Frost, 1916

The Archimedian definition of a straight line as the shortest path between two points was an early expression of a variational principle, leading to the modern idea of a geodesic path. In the same spirit, Hero explained the paths of reflected rays of light based on a principle of least distance, which Fermat reinterpreted as a principle of least time, enabling him to account for refraction as well. Subsequently, Maupertius and others developed this approach into a general principle of least action, applicable to mechanical as well as optical phenomena. Of course, as discussed in Chapter 3.4, a more correct statement of these principles is that systems evolve along stationary paths, which may be maximal, minimal, or neither (at an inflection point).

This is a tremendously useful principle, but as a realistic explanation it has always been at least slightly suspect, because (for example) it isn't clear how a single ray of light (or a photon) moving along a particular path can "know" that it is an extremal path in the variational sense. To illustrate the problem, consider a photon traveling from A to B through a transparent medium whose refractive index n increases in the direction of travel, as indicated by the solid vertical lines in the drawing below:

Since the path AB is parallel to the gradient of the refractive index, it undergoes no refraction. However, if the lines of constant refractive index were tilted as shown by the dashed diagonal lines in the figure, a ray of light initially following the path AB will be refracted and arrive at C, even though the index of refraction at each point along the path AB is identical to what it was before, where there was no refraction. This shows that the path of a light ray cannot be explained solely in terms of the value of the refractive index the path. We must also consider the transverse values of the refractive index along neighboring paths, i.e., along paths not taken.

The classical wave explanation, proposed by Huygens, resolves this problem by denying that light can propagate in the form of a single ray. According to the wave interpretation, light propagates as a wave front possessing transverse width. A small section of a propagating wave front is shown in the figure below, with the gradient of the refractive index perpendicular to the initial trajectory of light:

Clearly the wave front propagates more rapidly on the side where the refractive index is low (viz, the speed of light is high) than on the side where the refractive index is high. As a result, the wave front naturally turns in the direction of higher refractive index (i.e., higher density). It's easy to see that the amount of deflection of the normal to the wave front agrees precisely with the result of applying Fermat's principle, because the wave front represents a locus of points that are at an equal phase distance from the point of emission. Thus the normal to the wave front is, by definition, a stationary path in the variational sense.

More generally, Huygens articulated the remarkable principle that every point of a wave front can be regarded as the origin of a secondary spherical wave, and the envelope of all these secondary waves constitutes the propagated wave front. This is illustrated in the figure below:

Huygens also assumed the secondary wave originating at any point has the same speed and frequency as the primary wave at that point. The main defect in Huygens' wave theory of optics was it's failure to account for the ray-like properties of light, such as the casting of sharp shadows. Because of this failure (and also the inability of the wave theory to explain polarization), the corpuscular theory of light favored by Newton seemed more viable throughout the 18th century. However, early in the 19th century, Young and Fresnel modified Huygens' principle to include the crucial element of interference. The modified principle asserts that the amplitude of the propagated wave is determined by the superposition of all the (unobstructed) secondary wavelets originating on the wave front at any prior instant. (Young also proposed that light was a transverse rather than longitudinal wave, thereby accounting for polarization - but only at the expense of making it very difficult to conceive of a suitable material medium, as discussed in Chapter 3.5.)

In his critique of the wave theory of light Newton (apparently) never realized that waves actually do exhibit "rectilinear motion", and cast sharp shadows, etc., provided that the wavelength is small on the scale of the obstructions. In retrospect, it's surprising that Newton, the superb experimentalist, never noticed this effect, since it can be seen in ordinary waves on the surface of a pool of water. Qualitatively, if the wavelength is large relative to an aperture, the phases of the secondary wavelets emanating from every point in the mouth of the aperture to any point in the region beyond will all be within a fraction of a cycle from each other, so they will (more or less) constructively reinforce each other. On the other hand, if the wavelength is very small in comparison with the size of the aperture, the region of purely constructive interference on the far side of the aperture will just be a narrow band perpendicular to the aperture.

The wave theory of light is quite satisfactory for a wide range of optical phenomena, but when examined on a microscopic scale we find the transfer of energy and momentum via electromagnetic waves exhibits a granularity, suggesting that light comes in discrete quanta (packets). Planck had originated the quantum theory in 1900 by showing that the so-called ultra-violet catastrophe entailed by the classical theory of blackbody radiation (which predicted infinite energy at the high end of the spectrum) could be avoided - and the actual observed radiation could be accurately modeled - if we assume oscillators lining the walls of the cavity can absorb and emit electromagnetic energy only in discrete units proportional to the frequency, . The constant of proportionality is now known as Planck's constant, denoted by h, and has the incredibly tiny value (6.626)10-34 Joule seconds. Thus a physical oscillator with frequency  emits and absorbs energy in integer multiples of h.

Planck's interpretation was that the oscillators were quantized, i.e., constrained to emit and absorb energy in discrete units, but he did not (explicitly) suggest that electro- magnetic energy itself was inherently quantized. However, in a sense, this further step was unavoidable, because ultimately light is nothing but its emissions and absorptions. It's not possible to "see" an isolated photon. The only perceivable manifestation of photons is their emissions and absorptions by material objects. Thus if we carry Planck's assumption to its logical conclusion, it's natural to consider light itself as being quantized in tiny bundles of energy h. This was explicitly proposed by Einstein in 1905 as a heuristic approach to understanding the photoelectric effect.

Incidentally, it was this work on the photoelectric effect, rather than anything related to special or general relativity, that was cited by the Nobel committee in 1921 when Einstein was finally awarded the prize. Interestingly, the divorce settlement of Albert and Mileva Einstein, negotiated through Einstein's faithful friend Besso in 1918, included the provision that the cash award of any future Nobel prize which Albert might receive would go to Mileva for the care of the children, as indeed it did. We might also observe that Einstein's work on the photoelectric effect was much more closely related to the technological developments leading to the invention of television than his relativity theory was to the unleashing of atomic energy. Thus, if we wish to credit or blame Einstein for laying the scientific foundations of a baneful technology, it might be more accurate to cite television rather than the atomic bomb.

In any case, it had been known for decades prior to 1905 that if an electromagnetic wave shines on a metallic substance, which possesses many free valence electrons, some of those electrons will be ejected from the metal. However, the classical wave theory of light was unable to account for several features of this observed phenomena. For example, according to the wave theory the kinetic energy of the ejected electrons should increase as the intensity of the incident light is increased (at constant frequency), but in fact we observe that the ejected electrons invariably possess exactly the same kinetic energy for a given frequency of light. Also, the wave theory predicts that the photoelectric effect should be present (to some degree) at all frequencies, whereas we actually observe a definite cutoff frequency, below which no electrons are ejected, regardless of the intensity of the incident light. A more subtle point is that the classical wave theory predicts a smooth continuous transfer of energy from the wave to a particle, and this implies a certain time lag between when the light first strikes the metal and when electrons begin to be ejected. No such time lag is observed.

Einstein's proposal for explaining the details of the photoelectric effect was to take Planck's quantum theory seriously, and consider the consequences of assuming that light of frequency  consists of tiny bundles - later given the name photons - of energy h. Just as Planck had said, each material "oscillator" emits and absorbs energy in integer multiples of this quantity, which Einstein interpreted as meaning that material particles (such as electrons) emit and absorb whole photons. This is an extraordinary hypothesis, and might seem to restore Newton's corpuscular theory of light. However, these particles of light were soon found to possess properties and exhibit behavior quite unlike ordinary macroscopic particles. For example, in 1924 Bose gave a description of blackbody radiation using the methods of statistical thermodynamics based on the idea that the cavity is filled with a "gas" of photons, but the statistical treatment regards the individual photons as indistinguishable and interchangeable, i.e., not possessing distinct identities. This leads to the Bose-Einstein distribution

which gives, for a system in equilibrium at temperature T, the expected number of particles in a quantum state with energy E. In this equation, k is Boltzman's constant and A is a constant determined by number of particles in the system. Particles that obey Bose-Einstein statistics are called Bosons. Compare this distribution with the classical Boltzman distribution, which applies to a collection of particles with distinct identities (such as complex atoms and molecules)

A third equilibrium distribution arises if we consider indistinguishable particles that obey the Pauli exclusion principle, which precludes more than one particle from occupying any given quantum state in a system. Such particles are called fermions, the most prominent example being electrons. It is the exclusion principle that accounts for the variety and complexity of atoms, and their ability to combine chemically to form molecules. The energy distribution in an equilibrium gas of fermions is

The reason photons obey Bose-Einstein rather than Fermi statistics is that they do not satisfy the Pauli exclusion principle. In fact, multiple bosons actually prefer to occupy the same quantum state, which led to Einstein's prediction of stimulated emission, the principle of operation behind lasers, which have become so ubiquitous today in CD players, fiber optic communications, and so on. Thus the photon interpretation has become an indispensable aspect of our understanding of light.

However, it also raises some profound questions about our most fundamental ideas of space, time, and motion. First, the indistinguishability and interchangeability of fundamental particles (fermions as well as bosons) challenges the basic assumption that distinct objects can be identified from one instant of time to the next, which (as discussed in Chapter 1.1) underlies our intuitive concept of motion. Second, even if we consider the emission and absorption of just a single particle of light, we again face the question of how the path of this particle is chosen from among all possible paths between the emission and absorption events. We've seen that Fermat's principle of least time seems to provide the answer, but it also seems to imply that the photon somehow "knows" which direction at any given point is the quickest way forward, even though the knowledge must depend on the conditions at points not on the path being followed. Also, the principle presupposes either a fixed initial trajectory or a defined destination, neither of which is necessarily available to a photon at the instant of emission.

In a sense, the principle of least time is backwards, because it begins by positing particular emission and absorption events, and infers the hypothetical path of a photon connecting them, whereas we should like (classically) to begin with just the emission event and infer the time and location of the absorption event. The principle of Fermat can only assist us if we assume a particular definite trajectory for the photon at emission, without reference to any absorption. Unfortunately, the assignment of a definite trajectory to a photon is highly problematical because, as noted above, a photon really is nothing but an emission and an associated absorption. To speak about the trajectory of a free photon is to speak about something that cannot, even in principle, ever be observed.

Moreover, many optical phenomena are flatly inconsistent with the notion of free photons with definite trajectories. The wavelike behavior of light, such as demonstrated in Young's two-slit interference experiment, defy explanation in terms of free particles of light moving along free trajectories independent of the emission and absorption events. The figure below gives a schematic of Young's experiment, showing that the intensity of light striking the collector screen exhibits the interference effects of the light emanating from the two slits in the intermediate screen.

This interference pattern is easily explained in terms of interfering waves, but for light particles we expect the intensity on the collector screen to be just the sum of the intensities given by each slit individually. Still, if we regard the flow of light as consisting of a large number of photons, each with their own phases, we might be able to imagine that they somehow mingle with each other while passing from the source to the collector, thereby producing the interference pattern. However, the problem becomes more profound if we reduce the intensity of the light source to a sufficiently low level that we can actually detect the arrival of individual photons, like clicks on a Geiger counter, by an array of individual photo-detectors lining the collector screen. Each arrival is announced by just a single detector. We can even reduce the intensity to such a low level that no more than one photon is "in flight" at any given time. Under these conditions there can be no "mingling" of various photons, and yet if the experiment is carried on long enough we find that the number of arrivals at each point on the collector screen matches the interference pattern.

The modern theory of quantum electrodynamics explains this behavior by denying that photons follow definite trajectories through space and time. Instead, an emitter has at each instant along its worldline a particular complex amplitude for emitting a photon, and a potential absorber has a complex amplitude for absorbing that photon. The amplitude at the absorber is the complex sum of the emission amplitudes of the emitter at various times in the past, corresponding to the times required to traverse each of the possible paths from the emitter to the absorber. At each of those times the light source had a certain complex amplitude for emitting a photon, and the phase of that amplitude advances steadily along the timeline of the emitter, giving a frequency equal to the frequency of the emitted light.

For example, when we look at the reflection of a light source on a mirror our eye is at one end of a set of rays, each of slightly different length, which implies that amplitude for each path corresponds to the amplitude of the emitter at a slightly different time in the past. Thus, we are actually receiving an image of the light source from a range of times in the past. This is illustrated in the drawing below:

If the optical path lengths of the bundle of incoming rays in a particular direction are all nearly equal (meaning that the path is "stationary" in the variational sense), their amplitudes will all be nearly in phase, so they reinforce each other, yielding a large complex sum. On the other hand, if the lengths of the paths arriving from a particular direction differ significantly, the complex sum of amplitudes will be taken over several whole cycles of the oscillating emitter amplitude, so they largely cancel out. This is why most of the intensity of the incoming ray arrives from the direction of the stationary path, which conforms with Hero's equi-angular reflection.

To test the reality of this interpretation, notice that it claims the absence of reflected light at unequal angles is due to the canceling contributions of neighboring paths, so in theory we ought to be able to delete the paths corresponding to all but one phase angle of the emitter, and thereby enable us to see non-Heronian reflected light. This is actually the principle of operation of a diffraction grating, where alternating patches of a reflecting surface are scratched away, at intervals in proportion to the wavelength of the light. When this is done, it is indeed possible to see light reflected at highly non-Heronian angles, as illustrated below.

All of this suggests that the conveyance of electromagnetic energy from an emitter to an absorber is not well-described in terms of a classical free particle following a free path through spacetime. It also suggests that what we sometimes model as wave properties of electromagnetic radiation are really wave properties of the emitter. This is consistent with the fact that the wave function of a putative photon does not advance along its null worldline. See Section 9.10, where it is argued that the concept of a "free photon" is meaningless, because every photon is necessarily emitted and absorbed. If we compare a photon to a clap, then a "free photon" is like clapping with no hands.

Recommended publications