The Profession of IT Peter J. Denning

The Locality Principle Locality of reference is a fundamental principle of computing with many applications. Here is its story.

ocality of reference is one of became such an engineering tri- engines, Web browsers, edge Lthe cornerstones of com- umph that it faded into the back- caches for Web-based environ- puter science. It was born ground of every operating ments, and computer forensics. from efforts to make virtual system, where it performs so well Tomorrow it may help us over- memory systems work well. Vir- at managing memory with multi- come our problems with brittle, tual memory was first developed threading and multitasking that unreliable software. in 1959 on the Atlas System at no one notices. I will tell the story of this prin- the University of ciple, starting with its dis- Manchester. Its supe- covery to solve a rior programming multimillion- environment dou- dollar performance bled or tripled pro- problem, through its grammer evolution as an idea, to productivity. But it its widespread adoption was finicky, its perfor- today. My telling is mance sensitive to the highly personal because choice of replacement locality was my focus algorithm and to the during the first part of ways compilers my career. grouped code onto pages. Worse, when it MANIFESTATION OF A NEED was coupled with mul- (1949–1965) tiprogramming, it was In 1949, the builders of prone to thrashing— the Atlas computer system at the near-complete col- the University of Manchester lapse of system throughput due recognized that computing sys- to heavy paging. The locality The locality principle found tems would always have storage principle guided us in designing application well beyond virtual hierarchies consisting of at least robust replacement algorithms, memory. Today it directly influ- main memory (RAM) and sec- compiler code generators, and ences the design of processor ondary memory (disk, drum). To thrashing-proof systems. It trans- caches, disk controller caches, simplify management of these R

E formed from an storage hierarchies, network hierarchies, they introduced the T Ö

R unpredictable to a robust, self- interfaces, database systems, page as the unit of storage and H S

L regulating technology that opti- graphics display systems, human- transfer. Even with this simplifi- E A

H mized throughput without user computer interfaces, individual cation, programmers spent well C I

M intervention. Virtual memory application programs, search over half their time planning and

COMMUNICATIONS OF THE ACM July 2005/Vol. 48, No. 7 19 The Profession of IT

programming page transfers, then could sink the system: translating entries in a small high- called overlays. In a move to addresses to locations; and replac- speed associative memory, later enable programming productivity ing loaded pages. They quickly known as the address or the to at least double, the Atlas sys- Tablefound 1. aMilestones workable solution in Development to the translation lookaside buffer. The tem builders therefore decided to translationof problem Locality by Idea. storing replacement problem was a much automate the overlaying process. copies of the most recently used more difficult conundrum. Their “one-level storage Because the disk access system” (later called vir- 1959 Atlas includes first virtual memory; a “learning time was about 10,000 tual memory) was part algorithm” replaces pages referenced farthest in the future [5]. times slower than the of the second-generation 1961 IBM Stretch supercomputer uses spatial locality to prefetch instructions CPU instruction cycle, Atlas operating system and follow possible branches. each page fault added a 1965 Wilkes introduces slave memory, later known as CPU cache, to hold in 1959 [5]. It simulated most recently used pages and significantly accelerate effective significant delay to a job’s a large main memory CPU speed [9]. completion time. There- within a small real one. 1966 Belady at IBM Research publishes comprehensive study of virtual fore, minimizing page memory replacement algorithms, showing that those with usage bits The heart of their inno- outperform those without [1]. faults was critical to system vation was the novel 1966 Denning proposes idea: the pages that must be retained in performance. Since mini- concept that addresses main memory are those referenced during a window of length T mum faults means maxi- preceding the current time. In 1967 he postulates that working set named values, not mem- memory management will prevent thrashing [2–4]. mum inter-fault intervals, ory locations. The 1968 Denning shows analytically why thrashing precipitates suddenly with the ideal page to replace CPU’s addressing hard- any increase above a critical threshold of number of programs in from main memory is the memory. Belady and Denning use term locality for the program ware translated CPU behavior property working sets measure [2–4]. one that will not be used addresses into memory 1969 Sayre, Brawn, and Gustavson at IBM demonstrate that programs with again for the longest time. good locality are easy to design and cause virtual memory systems to locations via an updata- perform better than a manually designed paging schedule. To accomplish this, the ble page table map. By Atlas system contained a 1970 Denning gathers all extant results for virtual memory into Computing allowing more addresses Surveys paper “Virtual Memory” that was widely used in operating “learning algorithm” that systems courses. This was first coherent scientific framework for than locations, their designing and analyzing dynamic memories [3]. hypothesized a loop cycle scheme enabled pro- 1970 Mattson, Gecsei, Slutz, and Traiger of IBM publish “stack algorithms,” for each page, measured grammers to put all their modeling a large class of popular replacement policies including LRU each page’s period, and and MIN; they offer surprisingly simple algorithms for calculating their instructions and data paging functions in virtual memory [7]. estimated which page was into a single address 1972 Spirn and Denning conclude that phase transition behavior is the most not needed for the longest space. The file contain- accurate description of locality [8]. time. ing the address space was 1970–74 Abramson, Metcalfe, and Roberts report thrashing in Aloha and The learning algorithm on the disk; the operat- Ethernet communication systems; load control protocols prevent it. was controversial. It per- 1976 Buzen, Courtois, Denning, Gelenbe, and others integrate memory ing system copied pages management into queueing network models, demonstrating that formed well on programs thrashing is caused by the paging disk transitioning into the bottleneck on demand (at page with increasing load. System throughput is maximum when the average with well-defined loops faults) from that file to working set space-time is minimum. and poorly on many other main memory. When 1976 Madison and Batson demonstrate that locality is present in symbolic programs. The contro- execution strings of programs, concluding that locality is part of human main memory was full, cognitive processes transmitted to programs [6]. versy spawned numerous the operating system experimental studies well 1976 Prieve and Fabry demonstrate VMIN, the optimal variable-space selected a main memory replacement policy; it has identical page fault sequence with working into the 1960s that page to be replaced at set but lower space-time accumulation at phase transitions. sought to determine what 1978 Denning and Slutz define generalized working sets; objects are local the next page fault. when their memory retention cost is less than their recovery costs. replacement rules might The Atlas system The GWS models the stack algorithms, space-time variations of work best over the widest working sets, and all variable-space optimal replacement algorithms. designers had to resolve 1980 Denning gathers the results of over 200 virtual-memory researchers and two performance prob- concludes that working set memory management with a single system- Table 1. Milestones in lems, either one of which wide window size is as close to optimal as can practically be realized [4]. development of locality idea.

20 July 2005/Vol. 48, No. 7 COMMUNICATIONS OF THE ACM Denning table 1 (7/05) possible range of programs. Their most of its time resolving page did not include virtual memory results were often contradictory. faults and little running the CPU. in its 360 operating system in Eventually it became apparent Thrashing was far more damaging 1964. Instead, it sponsored at its that the volatility resulted from than a poor replacement algorithm. Watson laboratory one of the variations in compiling methods: It scared the daylights out of the most comprehensive experimental the way in which a compiler Table computer2. Milestones makers. in Adoption of Locality.systems projects of all time. Led grouped code blocks onto pages The more conservative IBM by Bob Nelson, Les Belady, and strongly affected the David Sayre, the project program’s performance 1961 IBM Stretch computer uses spatial locality for instruction lookahead. team built the first virtual- under a given replace- machine operating system 1964 Major computer manufacturers (Burroughs, General Electric, RCA, ment strategy. Univac but not IBM) introduce virtual memory with their “third- and used it to study the generation computing systems.” Thrashing is a significant Meanwhile, in the performance problem. performance of virtual early 1960s, the major memory. (The term “vir- 1965-1969 Nelson, Sayre, and Belady, at IBM Research built first virtual machine computer makers were operating system; they experiment with virtual machines, contribute tual memory” appears to significant insights into performance of virtual memory, mitigate drawn to multipro- thrashing through load control, and lay groundwork for later IBM have come from this proj- grammed virtual mem- virtual machine architectures. ect.) By 1966 they had ory because of its 1968 IBM introduces cache memory in 360 series. Multics adopts tested every replacement “clock” replacement algorithm, a variant of LRU, to protect recently superior programming used pages. policy that anyone had environment. RCA, 1969–1972 perating systems researchers demonstrate experimentally that the ever proposed and a few working set policy works as claimed. They show how to group General Electric, Bur- code segments on pages to maximize spatial locality and thus more they invented. Many roughs, and Univac all temporal locality during execution. of their tests involved the included virtual mem- 1972 IBM introduces virtual machines and virtual memory into 370 series. “use bits” built into page Bayer formally introduces B-tree for organizing large files on disks ory in their operating to minimize access time by improving spatial locality. Parnas tables. By periodically introduces information hiding, a way of localizing access to variables systems. Because a bad within modules. scanning and resetting the replacement algorithm bits, the replacement algo- 1978 First BSD Unix includes virtual memory with load controls inspired could cost a million dol- by working set principle; propagates into Sun OS (1984), Mach rithm distinguishes lars of lost machine time (1985), and Mac OS X (1999). recently referenced pages 1974–79 IBM System R, an experimental relational database system, uses over the life of a system, LRU managed record caches and B-trees. from others. Belady con- they all paid a great deal cluded that policies favor- 1981 IBM introduces disk controllers containing caches so that database of attention to replace- systems can get records without a disk access; controllers use LRU ing recently used pages ment algorithms. but do not cache records involved in sequential file accesses. performed better than Nonetheless, by 1966 early 1980s Chip makers start providing data caches in addition to instruction other policies; LRU (least caches, to speed up access to data and reduce contention at these companies were memory interface. recently used) replacement reporting their systems late 1980s Application developers add “most recent files” list to desktop was consistently the best were susceptible to a applications, allowing users to more quickly resume interrupted tasks. performer among those new, unexplained, cata- 1987–1990 Microsoft and IBM develop OS/2 operating systems for PCs, with tested [1]. full multitasking and working set managed virtual memory. strophic problem they Microsoft splits from IBM, transforms OS/2 into Windows NT. called thrashing. Early 1990s Computer forensics starts to emerge as a field; it uses locality and DISCOVERY AND signal processing to recover the most recently deleted files; and it PROPAGATION OF LOCALITY Thrashing seemed to uses multiple system and network caches to reconstruction actions have nothing to do with of users. IDEA (1966–1980) the choice of replacement 1990–1998 Beginning with Archie, then Gopher, Lykos, Altavista, and finally In 1965, I entered my Google, search engines compile caches that enable finding relevant policy. It manifested as a documents from anywhere on the Internet very quickly. Ph.D. studies at MIT in sudden collapse of 1993 Mosaic (later Netscape) browser uses a cache to store recently Project MAC, which was throughput as the multi- accessed Web pages for quick retrieval by the browser. programming level rose. 1998 Akamai and other companies provide local Web caches (“edge Table 2. Milestones in adoption A thrashing system spent servers”) to speed up Internet access and reduce traffic at sources. of locality.

COMMUNICATIONS OF THE ACM July 2005/Vol. 48, No. 7 21 Denning table 2 (7/05) The Profession of IT

just undertaking the development speak of a process’s intrinsic set of pages referenced in the vir- of Multics. I was fascinated by memory demand as its “working tual time interval of length T pre- the problems of dynamically allo- set.” The idea was that paging ceding time t [2]. cating scarce CPU and memory would be acceptable if the system By spring 1967, I had an resources among the many could guarantee that the working explanation for thrashing. processes that would Thrashing was the col- populate future time- lapse of system through- sharing systems. put triggered by making I set myself a goal to the multiprogramming solve the thrashing prob- level too high. It was lem and define an effi- counterintuitive because cient way to manage we were used to systems memory with variable that would saturate partitions. Solutions to under heavy load, not these problems would be shut down. I showed worth millions of dollars that, when memory was in recovered uptime of filled with working sets, virtual memory operat- any further increment in ing systems. Little did I the multiprogramming know that I would have Figure 1. Locality-sequence level would simultane- to devise and validate a behavior diagrammed by programmer ously push all loaded theory of program behavior to during overlay planning. programs into a regime of work- accomplish this. ing set insufficiency. Programs I learned about the controver- set was loaded. I combed the whose working sets were not sies over the viability of virtual experimental studies looking for loaded paged excessively and memory and was baffled by the clues on how to measure a pro- could not use the CPU effi- contradictory conclusions among gram’s working set. ciently. I proposed a feedback the experimental studies. I In an “Aha!” moment during control mechanism that would noticed all these studies examined the waning days of 1966, limit the multiprogramming level individual programs assigned to a inspired by Belady’s observations, by refusing to activate any pro- fixed memory partition managed I hit on the idea of defining a gram whose working set would by a replacement algorithm. They process’s working set as the set of not fit within the free space of did not measure the dynamic pages used during a fixed-length main memory. When memory was partitions used in multipro- sampling window in the immedi- full, the operating system would grammed virtual memory sys- ate past. A working set could be defer programs requesting activa- tems. There was no notion of a measured by periodically reading tion into a holding queue. Thrash- dynamic, intrinsic memory and resetting the use bits in a ing would be impossible with a demand that would tell us which page table. The window had to working set policy. pages of the program were essen- be in the virtual time of the The working set idea was based tial and which were replaceable— process—time as measured by the on an implicit assumption that the something simple like, “this number of memory references pages seen in the backward win- process needs p pages at time t.” made—so that the measurement dow were highly likely to be used Such a notion was incompatible would not be distorted by inter- again in the immediate future. with the fixed-space policies ruptions. This led to the now- Was this assumption justified? In everyone was studying. I began to familiar notation: W(t,T) is the discussions with Jack Dennis

22 July 2005/Vol. 48, No. 7 COMMUNICATIONS OF THE ACM (MIT) and Les Belady (IBM), I ing due to looping and executing a network of servers. (Table 1 lists started using the term “locality” within modules with private data; key milestones.) By 1980, we for the observed tendency of pro- and spatial clustering due to defined locality much the same as grams to cluster references to small related values being grouped into it is defined today [4], in terms of subsets of their pages for extended arrays, sequences, modules, and a distance from a processor to an intervals. We could represent a other data structures. Both these object x at time t, denoted D(x,t). program’s memory demand as a reasons seemed related to the Object x is in the locality set at sequence of locality sets and their human practice of “divide and time t if the distance is less than a holding times: conquer”—breaking a large prob- threshold: D(x,t) ≤ T. Distance lem into parts and working sepa- can take on several meanings: (L1,T1), (L2,T2), ... , (Li,Ti), ... rately on each. The locality bit (1) Temporal: A time distance, maps captured someone’s prob- such as the time since last refer- This seemed natural because we lem-solving method in action. ence, time until next reference, or knew that programmers planned These underlying phenomena even an access time within the storage system or network. (2) Spatial: A space distance, such as the number of hops in a network or number of addresses away in a sequence. (3) Cost: Any non- decreasing function of time since prior reference.

ADOPTION OF LOCALITY PRINCIPLE (1967–PRESENT) The locality principle was adopted as an idea almost imme- diately by operating systems, database, and hardware architects. It was applied in ever-widening circles: overlays using diagrams that Figure 2. Locality-sequence behavior showed subsets and their time observed by sampling use bits during • In virtual memory to organize phases (see Figure 1). But what program execution. caches for address translation was surprisingly interesting was and to design the replacement that programs showed the local- gave us confidence to claim that algorithms. ity behavior even when it was not programs have natural sequences • In data caches for CPUs, explicitly preplanned. When of locality sets. The working set originally as mainframes and measuring actual page use, we sequence is a measurable approxi- now as microchips. repeatedly observed many long mation of a program’s intrinsic • In buffers between main phases with relatively small local- locality sequence. memory and secondary mem- ity sets (see Figure 2). Each pro- During the 1970s, I continued ory devices. gram had its own distinctive to refine the locality idea and • In buffers between computers pattern, like a voiceprint. develop it into a behavioral theory and networks. We saw two reasons that this of computational processes inter- • In video boards to accelerate would happen: temporal cluster- acting with storage systems within graphics displays.

COMMUNICATIONS OF THE ACM July 2005/Vol. 48, No. 7 23 The Profession of IT

• In modules that implement the process’s locality set are kept in a objects, interpreted relative to information-hiding principle. local cache with fast access time. semantic webs in which they are • In accounting and event logs The cache can be a very high- embedded, can provide numerous that monitor activities within a speed chip attached to a proces- clues to what the user is doing system. sor, a main memory buffer for and allow the system to adapt • In the “most recently used” disk or network transfers, a direc- dynamically and avoid unwel- object lists of applications. tory for recent Web pages, or a come responses. • In Web browsers to hold recent server for recent Internet Early virtual memory systems Web pages. searches. The performance accel- suffered from thrashing, a form of • In search engines to find the eration of a cache generally justi- software autism. Virtual memories most relevant responses to fies the modest investment in the harnessed locality to measure soft- queries. cache storage. ware’s dynamic memory demand • In spread spectrum video Two new uses of locality are and adapt the memory allocation streaming that bypasses network worth special mention. First, the to avoid thrashing. Future soft- congestion and reduces the burgeoning field of computer ware may harness locality to help apparent distance to the video forensics owes much of its success it discern the user’s context and server. to the ubiquity of caches. They avoid autistic behavior. c • In edge servers to hold recent are literally everywhere in operat- Web pages accessed by anyone ing systems and applications. By References in an organization or geograph- recovering these caches, forensics 1. Belady, L.A. A study of replacement algo- rithms for virtual storage computers. IBM Sys- ical region. experts can reconstruct an amazing tems J. 5, 2 (1966), 78–101. amount of evidence about a crimi- 2. Denning, P.J. The working set model for pro- gram behavior. Commun. ACM 11, 5 (May Table 2 lists milestones in the nal’s motives and intent. Criminals 1968), 323–333. adoption of locality. The locality who erase data files are still not 3. Denning, P.J. Virtual memory. ACM Comput- principle is today considered a safe because experts use signal-pro- ing Surveys 2, 3 (Sept. 1970), 153–189. 4. Denning, P.J. Working sets past and present. fundamental principle for systems cessing methods to recover the IEEE Trans. on Software Engineering SE-6, 1 design. faint magnetic traces of the most (Jan. 1980), 64–84. 5. Kilburn, T., Edwards, D.B.G., Lanigan, M.J., recent files from the disk. Not even Sumner, F.H. One-level storage system. IRE FUTURE USES OF LOCALITY PRINCIPLE a systems expert can find all the Transactions EC-11 (Apr. 1962), 223–235. The locality principle flows from caches and erase them. 6. Madison, A.W., and Batson, A. Characteristics of program localities. Commun. ACM 19, 5 human cognitive and coordina- Second, a growing number of (May 1976), 285–294. tive behavior. The mind focuses software designers are realizing 7. Mattson, R.L., Gecsei, J., Slutz, D.R., and on a small part of the sensory software can be made more Traiger, I.L. Evaluation techniques for storage hierarchies. IBM Systems J. 9, 2 (1970), field and can work most quickly robust and less brittle if it can be 78–117. on the objects of its attention. aware of context. Some have 8. Spirn, J. Program Behavior: Models and Mea- surements. Elsevier Computer Science (1977). People gather the most useful characterized current software as 9. Wilkes, M.V. Slave memories and dynamic objects close around them to “autistic,” meaning that it is storage allocation. IEEE Transactions on Com- minimize the time and work of uncommunicative, unsocial, and puters EC-14 (Apr. 1965), 270–271. using them. These behaviors are unforgiving. An era of “post- transferred into the computa- autistic software” can arise by Peter J. Denning ([email protected]) is tional systems we design. exploiting the locality principle to the director of the Cebrowski Institute for The locality principle will be infer context and intent by information and innovation and superiority at useful wherever there is an advan- watching sequences of references the Naval Postgraduate School in Monterey, tage in reducing the apparent dis- to objects—for example, files, CA, and is a past president of ACM. tance from a process to the documents, folders, and hyper- objects it accesses. Objects in the links. The reference sequences of © 2005 ACM 0001-0782/05/0700 $5.00

24 July 2005/Vol. 48, No. 7 COMMUNICATIONS OF THE ACM