arXiv:cs/0501046v1 [cs.IT] 21 Jan 2005 rsn td spr fawdrpormamda xlrn th exploring at aimed program o Dyson[ wider encroach a (cf. traffic— of to “native” matter part attempt heavy is microscopic study by an c present of capacity like purposes, near bustle more computational jammed near-equilibrium for and already scale more turf finer a look and will finer putation a on anisms iy an aysS. otn A02215. MA Boston, St., Mary’s Saint 8 sity, ua rigt aetems fwaee aaiyi left. is capacity whatever of “secon most the the and na make medium, blind to recording a trying the be human degrade se would to and emp user” tending independently “first purpose Our cess act the practical typically, users (For class.) maliciously; successive not this the erased. where into never case falls the but “ether” time electromagnetic any the at ma plate—where added photographic be or tape punched as medium—such h os aeadi ital xc vrms fterange—bec the tape. of used most heavily over of exact limit accur the virtually quite in is is exact and it case nonetheless, worst approximate; infinit of the only amount of any is sequence for but holds a principle usage, for strong holds The “effectiv pri only increments. equivalence but appropriate age two exact, an identify is We by principle details? weak tape other all used tap forget virgin of and of piece length a shorter characterize a purposes—to transmission tion u) htw yial aei idfr“h te party” other “the made for be mind can in is have goals typically any we (if what out); goals cooperation independent no pursues volunteers and party, other non-malicious, other the presumed where co- the situation On while a a in by interested context. are generated pre-planned we hand, a tape information- in the a partner stress reusing operative They of aspects [6]). engineering cf. (also Shamir[7] and “effective its by details? simply other one all tape can forget used or and of degrees, length” different piece to a used between characterize been differences have qualitative that any tapes there of Are length tape? shorter virgin a purposes—to transmission information actual the on depend tape. achieve will the and to of tape, expect cannot conditions virgin can with made Bob than already less density holes is storage Since the holes able remove undone, not there. be be sense but already to ones can new are which wants punch that unit, and He tape read/punch the tape data. in holes standard reuse own to a his like for use would storing he to care for used tape: doesn’t the tape of on He the already amount is conditions”. that large data good the a “in dump, tape punched Alice’s through maging medium noisy of capacity storage for ciples osams xcl,cmsa i fasurprise. a of a as comes exactly, almost does 1 oms ooi( Toffoli Tommaso owa xeti egho sdtp euvln”frinform “equivalent”—for tape used of length a is extent what To Abstract h atta togeuvlnede o odeaty u then but exactly, hold not does equivalence strong that fact The h hm edvlpi opeetr ota fRivest of that to complementary is develop we theme The “equivalent”—for tape used of length a is extent what To rum- found, student—has science computer poor Bob—a Keywords aua processes natural shmn eoemr rfiin tepotn hsclmech physical exploiting at proficient more become humans As hroyaiso sdpnhdtp:Aweak A tape: punched used of Thermodynamics W td h eetdueo oooi recording monotonic a of use repeated the study —We hroyaiso rt-nemda qiaec prin- Equivalence media, write-once of Thermodynamics — [email protected] . 1 n togeuvlneprinciple equivalence strong a and ,EEDprmn,Bso Univer- Boston Department, ECE ), cpe.The nciples. incremental t vnin even ate ai son is hasis ?Cnwe Can e? fihy but lfishly, oms Toffoli Tommaso smlus- esimal ua pro- tural ].The 4]). length” e sr a user” d k can rks skind is ,also s, oming om- the a- it n - h ntutost h uc ntare unit punch the to instructions The nutv debriefing. intuitive own. our make we have”) you (“Do moral what whose with Costa[2], best substance by detailed the ” in dirty not re- on if “ paper spirit in is A one present punching. “stuck- the before to and or lated punching, during (cells during sensed tape faults sensed the at-1” punch) dy- on not that faults will extend. algorithms “stuck-at-0” that and coding to simplify discuss adjust [5] we namically references and results see [3] (also whose References [8] of in some investigated therein), first was tape used alda called ihtefloigrslso h tape the on results following the with stu hrceie yasnl ubr aey h hole called the namely, be number, density—will single punch) a (or by cells—and characterized individual thus the is for distributions independent cal ihacnnclhl itiuino density of distribution or hole usage canonical of a round with each on that sume nwa olw l itiuin ilb ail understood tacitly be canonical. will be reason, distributions to this all For follows stage. what successive in every at canonical be canon- will course, of starting with is, that, ical distribution imply tape—whose assumptions virgin usage from our Thus, well. as ueta h nevnn ucigpoespcso the on packs process punching the tape intervening density the hole that uniform sume a with out p n h eutn oedistribution hole resulting the and oe.Fo h bv supin n a rv that prove can one block distribution assumptions long punch above the sufficiently the both of From means by codes. asymp- achieved can be efficiency totically maximum such theorem, Shannon’s ihsatgdensity startng with fcmuainlregime. computational of sanwhl density hole new a is fyuhv otm tal edjust read all, at time no have you If randomly-punched of capacity channel cumulative The ahpsto ntetp hr oemyapa is appear may hole a where tape the on position Each oe(rpnh itiuinta atr noidenti- into factors that distribution punch) (or hole A h euto pligapnhdensity punch a applying of result The cell maximum p h w osbecl ttsare states cell possible two the ; —nu,pnh n uptdistributions output and punch, 0—input, = l tt cinnwstate new action state old ln spare blank oespare hole ln punch blank oepunch hole p ′ muto e nomto compatible information new of amount 1 = .Orientation I. p n agtdensity target and − (1 − q ile ya pia code optimal an by yielded p 7→ 7→ 7→ 7→ )(1 p canonical ′ − utemr,w as- we furthermore, ; blank hole hole hole p § stage ′ q — self-contained, V—a ) utb canonical be must . q punch p oahl density hole a to ′ hole h aestarts tape the codn to According . esalas- shall We . p and n comes and and blank spare (1) 1 . , 2

A canonical punch distribution entails that, once the input monotonic noise. In the channel diagram of Fig. 1, the hole density p is known, there is no further advantage in input variable X represents the instruction given to the knowing the position of the individual holes; in other words, punch unit while scanning a cell, and the output variable overpunching can be carried out in a data-blind fashion. Y represents the resulting cell state. An “error” occurs when a cell spared by the punch unit turns out already to Let’s examine a few distinguished cases. contain a hole. The conditional probability P (hole|spare) • If p = 0 the tape is blank—Bob can resell it as virgin tape. associated with this transition equals the current hole den- • If p = 1 − p = 1/2, the tape has already been utilized sity p. by Alice at its maximum information capacity of one bit per cell. That would seem to leave Bob with no room for P (x) x y P (y) further information storage. But remember that he doesn’t p ✲ ′ care about the old information: punching new holes will q spare • • blank p = pq ◗◗ p ◗ destroy some of it but will encode some of his own! In fact ◗ (see §III below), with a punch density q = 3/5, Bob can ◗ ◗ record on the tape as much as about .322 per cell. 1 ◗◗s ′ q punch • ✲ • hole p = pq +1q • If p = 1, the tape carries no information for Alice—just as in the case p = 0. However, now there is no way Bob Fig. 1 can put any information on it. Alice wantonly spoiled the Channel diagram of used punched tape viewed as a tape. monotonic-error binary channel.

II. Notation From the joint and marginal distributions of X and Y , If p is a probability, it will be convenient to write p for namely, 1 − p. Thus, in (1), y blank hole ′ p =1 − pq = pq, or q = p′ /p. x p q 1 − p q , (2) spare q p q pq We shall use natural logarithms throughout. It will be punch q 0 p convenient to write ln x for − ln x. The self-information function, defined as one obtains, for this channel operated at a punch density q, a mutual information y = x ln x, ′ ′ p will play an important role in the equivalence principles ∆I = H(p q) − qH(p)= H(p ) − H(p). (3) p discussed here (see §VII). The binary entropy function, defined by The quantity ∆I is the amount of new information that can H(p)= p ln p + p ln p, be encoded on a tape having a hole density p by punching it ′ is the average of the self-information function over the bi- with a density q, resulting in a new hole density p (p, q)= nary distribution {p, p}. 1 − pq. Both self-information and binary entropy, as defined The relation expressed by equation (3)—plotted in here, measure information in natural units or nats. Con- Fig. 2—completely characterizes the bulk properties of version of information quantities to binary units or bits is punched tape as a communication channel. The rest of this achieved by explicitly factoring out the constant paper is devoted to extracting some of its implications. The capacity C of the channel is the maximum of ∆I bit = ln 2 ≈ .693; over all possible values of q (or, equivalently, of p′). By ′ thus, for example, the entropy of four equally probable mes- equating to zero the derivative of ∆I with respect to p , sages is ln 4 = 2 ln 2 = 2 bit. d∆I p′ H(p) = ln + =0, If X and Y are random variables, P (x) will denote the dp′ p′ p probability that X = x, and P (x.y) the probability that X = x and Y = y. The mutual information between X one finds that this maximum occurs at and Y is defined as 1 1 q =1 − , or p′ = , (4) P (x)P (y) p(eH(p)/p + 1) e−H(p)/p +1 {X; Y } = P (xy) ln . P (x.y) b b Xx,y where ∆I attains the value − For more background on information theory, see the ex- C = ln(e H(p)/p +1)= ln p′, (5) cellent introduction by Abramson[1]. as plotted in Fig. 3. In particular, for p =1b /2, III. Used tape as a monotonic binary channel 3 4 5 q = , p′ = , and C = ln ≈ .322 bit. Under the above assumptions (§I), used punched tape 5 5 4 may be viewed as a communication channel affected by b b 3

ln 2 1 1 1 p=0 (×bit) ↑ ∆I = H(p q) − qH(p) q ↑ 1 b q = 1 − ∆I H(p)/p p=1/4 p(e + 1) 1−1/e b .632

p=1/2 3/5 .6 ln 5 .322 4 1/2 .5

p=3/4

0 p=1 0 1 3 − 1 0 2 5 1 e 1 q → Fig. 2 Mutual information ∆I of the “used tape” channel, plotted as a function of the punch density q for various values of the current hole density p treated as a parameter. The 0 0 maximum of each curve is marked. 0 1/2 1 p → Fig. 4 ln 2 1 Plot of the optimal punch density q vs current hole density (×bit) p. The distinguished points are discussed in the text. That ↑ b − lying on the q = p line is discussed in §IV-B. C C = ln(e H(p)/p − 1) b

length of tape grows as n ln n; therefore, the amount per unit length is unbounded!

5 ln 4 .322 B. Tape wars averted We have seen that, if the original tape was punched at a density p by Alice, with an attendant rate H(p) for her 0 0 message, then Bob can achieve his channel capacity C(p) 1 1 3 0 4 2 4 1 p → as in (5) by punching at density q(p) as in (4). In the Fig. 3 process, Alice’s original message is, of course, degraded. In Channel capacity of used tape, C, as a function of the fact, if Alice tries to read back herb message, she will find current hole density p; the dots match those of Fig. 2. it contaminated by the same amount of one-way noise as if it had gone through the channel described by exchanging q and p in Fig. 1 and table (2). Suppose now that Alice, realizing that her tapes are go- IV. Cooperation and competition ing to be reused (or concurrently used—since, as we have For sake of contrast with the current context of self- seen, the two punching operations commute) by Bob, de- ish, independent utilization of the tape by each successive cides to encode her next batch of tapes so as to make her party, in the following two subsections we shall briefly dis- messages readable even after an anticipated punching by cuss the possibilities of cooperation and competition. Bob at density q. According to (4) and Fig. 4, as a pre- ventive measure she will have to shift her punch density p A. You shall receive an hundredfold to a higher value than 1/2, thus achieving a lower rate but In two successive selfish transmission stages starting greater resistance to Bob’s tampering. When Bob realizes from virgin tape, Alice got 1 bit’s worth of message out that, he will be forced to shift his punch density q to a of each cell and Bob .322 bit, for a total of 1.322 bit. By higher value—and so forth. collaborating, they could do much better[7], [6]. In fact, if This is not a zero-sum game: as the arms race unfolds, Alice and Bob worked in concert, with a very simple code each party will end up storing progressively less informa- they could each get two bits’ worth of message out of ev- tion on the tape. Will the race lead to the mutual de- ery three cells, for a total of 4/3 ≈ 1.333 bit/cell; with struction of information capacity? Fortunately, the curve long block codes, they could get up to about 1.55 bit/cell. of Fig. 4 intersects the line q = p and there has a slope less The advantages of collaboration show up even better when than 1. Thus, the race converges to a stable point (with one can plan ahead a long series of transmission stages q = p ≈ 0.609), where eachb party achieves an effective with a long length of tape: in this situation, the cumula- storage capacity of ≈ .240 bit/cell. tive amount of message worth one can get out of an n-cell The sum of the two capacities—and these are coexist- 4 ing capacities, with both messages readable at the same Sue: That will be about 119,000 bits. time!—is about 0.48 bit/cell, to be compared with the Bob, punching keys on a calculator: Hey, this time, 1 bit/cell Bob and Alice could have achieved by “space- it will only shrink to 119,000/333,333=.357 of its pre- sharing” the tape (e.g., one cell for Alice, one for Bob, transmission capacity [with an accusing tone] Are you and so forth). Thus, the attempt by the two parties to sure you made the best use of my tape last time? concurrently use the monotonic-write tape, performed in a Sue, chuckling: Cool off, Bob! Every housewife knows selfish but rational way, results in an overall loss of storage that used tape shrinks less! In fact, really ripe tape only capacity that is substantial but not crippling. shrinks to 1/e ≈ .368 of its previous capacity upon each It must be noted that, even though at equilibrium they usage. are in a symmetric situation, Alice and Bob cannot use the very same block code to encode their messages on the Bob: And brand new tape? tape. To avoid interference the two codes must be practi- Sue: New tape is the worst! It will shrink to ln 5/ ln 2−2 ≈ cally uncorrelated or “mutually orthogonal”; this is always .322 of its previous capacity. Here’s the whole picture! possible with long enough block codes. (Fig. 5)

1/e V. Intercom dialogues ··· ∞ .368 ln 5 3 4 5 6 7 8 9 10 −2 2 .322 We introduce the issue of tape equivalence by means of ln 2 1 two dialogues. The length ℓ of a piece of tape is the number of cells it contains. Because of the canonical distribution ↑ of both holes and punches (§I), the overall capacity of a s tape of length ℓ is ℓ times the capacity of a single cell, and similarly for the mutual information. 0 0 0 .9 .99 .999 .9999 1 p → Dialogue 1 Fig. 5 ′ Bob is now an old and stingy facilities officer at Caltech. “Shrinkage coefficient” s = C(p )/C(p) for successive He can no longer see the individual holes on the tape— full-capacity usages (labeled 1, 2,b . . . , ∞) of an initially his vision is blurred—and he wouldn’t any longer know virgin tape. As the tape gets more thoroughly used, the how to start writing a block code. All he cares about is shrinkage coefficient rapidly converges to 1/e. tape as a bulk commodity, and getting the most out of it. He is assisted by Sue, who physically handles the tape and knows how to devise appropriate block codes. Sue has The two physical parameters of a piece of tape, namely, standing instructions to recycle paper tape to the best of its length ℓ (in cells) and its current hole density p, com- her capabilities and not to bother Bob with details. pletely characterize its “response”—in terms of amount of information transmitted and capacity left—upon each suc- Bob, on the intercom: Sue, we have to send a million-bit cessive usage, including usages with a punch density q< q message to MIT. Get a piece of tape. (where some of the capacity is saved for later) or q > q Sue, from the mail room: I’ve gothere a reel of tape with (where some capacity is wasted), according to equationsb an overall capacity of one million bits. [She doesn’t tell (1), (3), and (5). b Bob whether that’s a thousand feet of virgin tape, or In particular, the capacity of a piece of tape of length ℓ is perhaps ten thousand feet of heavily used tape.] ℓC(p) (cf. (5)). This can be thought of—if we measure ca- Bob: Good! Here is the message. Don’t waste any capac- pacity in bits (see §II)—as the reduced length of the tape— ity, and make sure you get the tape back from MIT so i.e., the number of cells of virgin tape having the same over- we can reuse it! By the way, what will be the capacity all capacity. Bob would have been delighted to find that left on the tape after this message? I want to enter it as two pieces of tape having the same reduced length are com- an asset in my inventory sheet. pletely equivalent for information-transmission purposes. Such an equivalence principle would allow him to charac- Sue: That will be 333,000 bits. terize a piece of tape by means of a single information- Bob: So, using this tape at capacity will leave it “shrunk” theoretical parameter—the reduced length—rather than to .333 of its previous capacity. Well, one third left is the two physical parameters ℓ and p, and greatly simplify better than nothing! his inventory bookkeeping. Bob, a week later: Sue, here is another message for MIT. If such an equivalence held, then, as a specific conse- Since it happens to be 333,000 bits long, let’s use the tape quence, the shrinkage coefficient of Dialogue 1, defined as you got back from them. [We assume that, after decod- C(p′) ing a message, MIT does not keep a record of the detailed s(p)= , C(p) hole pattern received. That might be used for improving b transmission efficiency, but at substantial storage cost.] would be independent of p. Unfortunately, as we have seen What will be the capacity left on the tape after this mes- in the dialogue, this is only approximately true (Fig. 5). sage? We’ll return to this problem, with better tools, in §VII. 5

Dialogue 2 can tell the difference between fresh tape and well-worn Sue is on vacation. Her temporary replacement, Willie, tape by the fact that the former exhibits just a little is being indoctrinated by Bob about the need to conserve more friction than the latter. tape. To test his coding capabilities, Bob chooses a spool VI. Weak equivalence of tape just like the one he gave Sue the first time. Let us explore in more detail what Bob discovered with Bob: Here is a length of used tape, Willie, and a million-bit Willie’s help. message to be sent to MIT. Please transmit the message Suppose that we start with virgin tape and record on it as efficiently as you can. a small amount dI of information by punching it at a very Willie: Is it urgent? low density. We ship the tape but ask the recipient to send Bob: Not, really. Take your time, but do a good job! it back to us after reading the message. We then record on this “slightly used” tape an additional small amount Bob, a month later: Well, did you get the tape back from of information,3 further increasing the hole density. We MIT? continue in this way, sending one after the other a large Willie: Here it is! number of messages each having a small information con- Bob: What’s its capacity now? tents, until the tape is completely filled with holes. If at Willie: 580,000 bits, more or less. each stage the encoding is done optimally, what is the cu- mulative information dI of the messages we sent? Bob: What? It only shrank to .580 of its original length? Assume that at a genericR stage of this process we start How did you manage that? ′ with a hole density p and increase it to p = p + dp by Willie: You know, haste makes waste. So I first encoded issuing punch commands with a probability dq per cell. only a small fraction of the message on the tape, using The channel diagram is the same as Fig. 1, but with input a very low punch density. MIT decoded that, wrote it and output probabilities as in Fig. 6. down, and sent back the tape. Then I encoded on the same tape another increment of the message, sent it to P (x) x y P (y) MIT, and so on. The tape must have gone back and p dq spare • ✲ • blank pdq forth twenty times! ◗◗ p ◗ Bob: In the limit of an infinite number of infinitesimal ◗ ◗ increments, how much information could you transmit ◗ 1 ◗◗s in this way? dq punch • ✲ • hole pdq +1dq Willie: Starting from virgin tape, about 2.37 bits/cell 2 Fig. 6 π (precisely, ln 2 ). Channel diagram for used tape, with input and output Bob: That’s amazing! probabilities corresponding to incremental use of the Willie: And, of course, at any intermediate moment the channel capacity. transmission “mileage” already used plus that which is still left on the tape equals a constant—provided you The hole density increment is dp = pdq, as the blanks, always travel very slowly. which appear with density p, are turned into holes with Bob: I got it! Your “mileage left” is the effective length I probability dq, while the holes, with density p, remain unaf- was looking for. No matter how different they look phys- fected. The mutual information of this infinitesimal punch- ically, two pieces of tape (say, one short and fresh and ing operation, calculated from (3) using dq in place of q, the other long and stale) having the same effective length is are equivalent for information transmission purposes. p + dp dI = H(p + dp) − H(p) (6) Willie: Slow down, Bob! That is true only as long as you p use them up slowly. By comparing Sue’s performance p H(p) ln p with mine, you realize that, when one tries to cram onto = ln + dp = dp. (7) a tape a substantial fraction of its channel capacity at  p p  p once, there are losses by “friction”, as it were.2 Well, one The indefinite integral of the integrand in the last expres- 2This behavior is qualitatively similar to that of mechanical sys- sion is tems. Consider a battery of internal resistance R connected to a load ln p dp = −Li2(1 − p), of impedance r. It will be convenient to use the normalized variable Z 1 − p p = 1/(1 + r/R), which goes from 0 to 1 as r/R goes from ∞ to 0. 4 The maximum power transfer occurs when p = 1/2 (i.e., r = R); in where Li2 is the dilogarithm function. Thus, the effective this case, half of the energy is dissipated by friction in R. As p → 0, capacity of a tape of hole density p, i.e., the total amount energy is transfered to the load more slowly but less of it is wasted by friction. (As p → 1, one gets less power out and wastes a greater 3For this, we need, of course, a very long block code. fraction of the energy.) Indeed, to an untrained eye the power trans- 4 This is one of the polylogarithm functions, defined by Lin(z) = fer curve 2p(1 − p)—an inverted parabola—is hard to tell apart from k ∞ z the binary entropy curve H(p). k=1 kn . P 6 of information that can be transmitted via it in successive a more natural measure of a tape’s information capacity small increments until all holes have been punched up, is than the reduced length introduced in Dialogue 1. Armed with this measure, let us now turn our attention 1 1 Q(p)= dI = −Li2(1 − x) = Li2(p), (8) from the special case of the limit of an infinite sequence Zp p of infinitesimal messages to the general case of finite-size as plotted in Fig. 7 (compare with the qualitatively similar messages, where the weak principle is not applicable. behavior of C, in Fig. 3); for virgin tape (p = 0), the Our goal is to eliminate the physical parameters p and 2 q between equations (3) and (8), and thus write a relation effective capacity is Li2(1) = π /6 (cf. [8]). Note that, by (8), directly between (a) the effective length λ of a piece of dI = −dQ. tape before the transmission of a message, (b) the effective length λ′ after the transmission, and (c) the amount S of Since Q is a function of state of the tape (i.e., it depends information conveyed by the message. If such a relation only on its state and not on the specific sequence of oper- exists, it may be assumed to be of the form ation that led to that state), dI is an exact differential. ′ f(λ, λ ,S)=0 π2/6 2.37 (×bit) and, since we are assuming a canonical hole distribution before and after punching, it must satisfy the scaling prop- Q = Li2(p) erty ′ ↑ f(aλ, aλ ,aS)=0 forany a. Q ′ Setting, as a special case, a = 1/λ , we obtain a relation between two variables g(σ, µ)= f(σ, 1,µ)=0, where λ′ Q(p′) S I(p, q) σ = = and µ = = . 0 0 λ Q(p) λ Q(p) 0 1 1 3 1 4 2 4 p → The variable µ—which is the mutual information for a Fig. 7 given stage of utilization of the tape—can be thought of Effective capacity Q as a function of the current hole as the information rate per unit of effective length of the density p. tape, and σ as the shrinkage coefficient attendant to that stage. Since the variables σ and µ depend on two parameters, p The above quantities are on a per-cell basis. Let us de- and q, we cannot a priori expect to eliminate both param- fine the effective length (cf. Dialog 2) of a piece of tape of eters when solving for µ with respect to σ. However, for length ℓ and hole density p as λ = ℓQ(p). If by a sequence a given initial hole density p treated as a fixed parameter, of small incremental messages we transmit an amount of we can eliminate just q and write information I per cell, and thus a total amount S = Iℓ for the entire piece of tape, the new effective length will µ = µp(σ). ′ 5 be λ = ℓ(Q − I). The corresponding shrinkage coefficient The result of this elimination, performed numerically for will be λ′ I S different values of p, are shown in Fig. 8, which also shows =1 − =1 − , the values of the eliminated parameter q on the µ(σ) curves. λ Q λ Paralleling the weak equivalence principle of the previous which is independent of the physical parameters ℓ and p section—which states that tapes having the same effective and depends only on the ratio between two information- capacity are indistinguishable at slow utilization rates—a theoretical quantities, i.e., the total amount S of informa- strong equivalence principle would be one that is valid for tion transmitted and the effective length λ of the tape. We any rate of utilization of the tape at any transmission stage, shall call this the weak equivalence principle for monotonic- from an infinitesimal hole-density increment (q close to 0) write media. to gross overpunching (q close to 1). I don’t know whether it is more surprising that, strictly speaking, punched tape VII. Strong equivalence does not obey a strong equivalence principle, or that, after Let us now explore in more detail what Bob discovered all, it turns out to do so to a very good approximation. In with Sue’s help. fact, as is clear from Fig. 8, after eliminating q between µ Whether we intend to utilize a piece of tape incremen- and σ some dependence on p remains, but this dependence tally, as in Dialogue 2, or in discrete installements, as in is slight in any case and rapidly vanishes as p approaches Dialogue 1, the effective length λ defined above provides 1. Intuitively, the one-parameter family of curves of Fig. 2 5This quantity is analogous to but distinct from the shrinkage co- nearly collapses—when expressed in terms of a more nat- efficient of Dialogue 1, which is a ratio of channel capacities. ural set of variables—onto a single curve (Fig. 8). 7

6 2 ln 2 .421 π p = 0 3/4 1/2 ❄ ✻ 1/e .368 p = 1 ✻ 1 q = 2 ✻

1/4 q = 3 .250 4 1 q = 4 ↑ µ

q = 1 q = 0 0 0 1 1 1 3 0 4 κ e 2 4 1 σ → Fig. 8 Mutual information per unit of effective length, µ, as a function of the shrinkage coefficient σ, for different values of the initial hole density p. The dotted lines represent loci of equal values for the running parameter q. For any value of p,

the maximum of µ represents the corresponding channel capacity given in units of effective length. The maximum of µ0 1 − 6 2 occurs at κ = 2 (1 π2 ln 2). For utilization rate below capacity, the same mutual information can be obtained with two different values of shrinkage, one corresponding to rational usage of the tape and the other to needlessly wasteful usage.

The curves µp(σ) all have slope −1 at σ = 1; this is an • in the “quasi-static” limit of slow utilization rate (weak expression of the weak equivalence principle (i.e., for small equivalence principle); q, the effective length decreases by an amount equal to the • for any utilization rate (strong equivalence principle) amount of information transmitted). They all have slope − exactly, but only in the limit of already heavily used ∞ at σ = 0, signifying that the waste of effective capac- tape, and ity increases precipitously when one punches at a density − approximately—but with good accuracy even in the much greater than that needed for transmitting at channel worst case—over the whole range of previous and future capacity. uses of the tape. The worst-case departure of the µp curves from the lim- iting curve µ1 = limp→1 µp occurs near the maximum bulge IX. Acknowledgements of the curves, and is substantially the same as the depar- ture of s from its 1/e limit as plotted in Fig. 5. The curves This research was funded in part by NSF (9305227-DMS) and in part by ARPA through the Ultra Program (ONR µp are not likely to be expressible in closed form; however, N00014-93-1-0660) and the CAM-8 project (ONR N00014- as is easy to prove, the limiting curve µ1 is nothing but the familiar self-information function µ = σ ln σ. To the same 94-1-0662). I am indebted to Peter Elias, Matteo Frigo, and approximation as the strong equivalence principle holds, Mark Smith for useful discussions, and to Ronald Rivest for this function gives the information-transfer characteristics some references. of punched tape (i.e., for any message, the capacity used References by it, that wasted, and that left after the message) over the tape’s entire utilization range. [1] Abramson, Norman, Information Theory and Coding, McGraw- Hill (1963). Let us remark that the self-information function appears [2] Costa, Max, “Writing on dirty paper”, IEEE Trans. Info. Theory in the limit also in Fig. 2. In fact, one can show that IT-29 (1983), 439–441. [3] Dolev, Danny, David Maier, Harry Marson, and Jeffrey Ull- I(p, q) man, “Correcting faults in write-once memory”, Proc. 16th Anu- lim = eq ln q. ual ACM Symp. on Theory of Computing, ACM (1984), 225–229. → p 1 C(p) [4] Dyson, Freeman, “Time without end: Physics and biology in an open universe”, Rev. Mod. Phys. 51 (1979), 447–460. VIII. Conclusions [5] Heegard, Chris, and Abbas El Gamal, “On the capacity of with defects,” IEEE Trans. Info. Theory IT-29 A piece of randomly punched tape is described by two (1983), 731–739. physical parameters—its length ℓ and its hole density p. [6] Maier, David, “Using write-once memory for database storage”, 1982 ACM Symposium on Principles of Database Systems, ACM We have raised the question of whether the tape’s be- (1982), 239–246. havior as an information transmission commodity can be [7] Rivest, Ronald, and Adi Shamir, “How to reuse a ‘Write-Once’ usefully characterized by a single information-theoretical memory”, Information and Control 55 (1982), 1–19. [8] Wolf, Jack, Aaron Wyner, Jacob Ziv, and J´anos Korner¨ , parameter—its effective length λ. We have concluded that “Coding for a Write-Once Memory”, AT&T Bell Lab. Tech. J. this is the case 63 (1984), 1089–1112. 8

Tommaso Toffoli Tommaso Toffoli received a Doctorate in Physics from the University of Rome, Italy, in 1967, and a Ph.D. in Com- puter and Communication Science from the University of Michigan, Ann Arbor, in 1976. In 1977 he joined the MIT Laboratory for Com- puter Science, eventually becoming the leader of the Information Mechanics group. In 1995 he joined the faculty of the Boston University ECE Department. His main area of interest, namely Information Mechanics, deals with fundamental connections between physical and computational processes. He has developed and pioneered the use of cellular automata machines, as a way of efficiently studying a vari- ety of synthetic dynamical systems that reflect constraints of physical law, such as locality, uniformity, and invertibility. Related areas of interest are: quantum computation; correspondence princi- ples between microscopic laws and macroscopic behavior; quantita- tive measures of “computation capacity” of a system—as contrasted to “information capacity,” and connections between Lagrangian ac- tion and amount of computation. A new initiative, Personal Knowledge Structuring, aims at developing cultural tools that will help ordinary people turn the computer into a natural extension of their personal faculties.