Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy

Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy Wen-Hann Wang, Jean-Loup Baer and Henry M. Levy Department of Computer Science, FR-35 University of Washington Seattle, WA 98195 Abstract 4. I/O devices use physical addresses as well, also requiring reverse translation. We propose and analyze a two-level cache organization that provides high memory bandwidth. The first-level cache is ac- 5. A virtual cache may need to be invalidated on a context cessed directly by virtual addresses. It is small, fast, and, with- switch because virtual addresses are unique to a single pro- out the burden of address translation, can easily be optimized cess. to match the processor speed. The virtually-addressed cache is None of these problems is insolvable by itself, and several schemes backed up by a large physically-addressed cache; this second- have been proposed for managing virtual caches. For example, level cache provides a high hit ratio and greatly reduces mem- dual tag sets, one virtual and one physical, can be used for ory traffic. We show how the second-level cache can be easily each cache entry [7, 61. As another example, the SPUR sys- extended to solve the synonym problem resulting from the use tem restricts the use of address space, prohibits caching of I/O of a virtually-addressed cache at the first level. Moreover, the buffers, and requires bus transmission of both virtual and phys- second-level cache can be used to shield the virtually-addressed ical addresses [ll]. However, these schemes tend to have perfor- first-level cache from irrelevant cache coherence interference. mance shortcomings or unpleasant implications for system soft- Finally, simulation results show that this organization has a ware. Virtually-addressed caches are fundamentally complicated, performance advantage over a hierarchy of physically-addressed and this time or space complexity reduces the ability of the cache caches in a multiprocessor environment. to match the ever-increasing needs of modern processors. To attack this problem, we propose a two-level cache organization Keywords: Caches, Virtual Memory, Multiprocessors, Mem- involving a virtually-addressed first-level cache and a physically- ory Hierarchy, Cache Coherence. addressed second-level cache (recent studies of two-level unipro- cessor and multiprocessor caches can be found in [4, 5, 12, 131). 1 Introduction The small first-level cache.can be fast to meet the requirements Virtually-addressed caches are becoming commonplace in high- of high-speed processors; it is virtually addressed to avoid the performance multiprocessors due to the need for rapid cache ac- need for address translation. The large second-level cache will cess ill, 3,171. A virtually-addressed cache can be accessed more reduce miss ratios and memory traffic; it is physically addressed quickly than a physically-addressed cache because it does not re- to simplify the I/O and multiprocessor coherence problems. Fur- quire a preceding virtual-t&physical address translation. How- thermore, we show how the second-level cache can be utilized ever, virtually-addressed caches have several problems as well. to solve the synonym problem and to shield the first-level cache For example: from irrelevant cache coherence traffic. Overall, we believe that 1. They must be capable of handling synonyms, that is, multi- this two-level virtual-real organization rsimplifies the design of the ple virtual addresses that map to the same physical address. first-level, where performance is crucial, while solving some of the difficult problems at the second level, where time and space are 2. While address translation is not required before a virtual more easily available. cache lookup, address translation is still needed following a Our organization involves the use of pointers in the two caches miss. to keep track of the mappings between virtual cache and physical 3. In a multiprocessor system, the use of a virtually-addressed cache entries [7]. We also provide a translation buffer at the sec- cache may complicate cache coherence because bus addresses ond level which operates in parallel with first-level cache lookups are physical, therefore a reverse translation may be re- in case a miss requires reverse translation. Trace-driven simula- quired. tions are used to demonstrate the advantages of a two-level V-R (virtual-real) cache over a hierarchy of real-addressed caches in a multiprocessor environment. The rest of this paper is organized as follows. Section 2 describes Permission to copy without fee all or part of this material is granted the approaches taken in solving various problems related to vir- provided that the copies are not made or distributed for direct commer- tual address caches and presents some design choices for high cial advantage, the ACM copyright notice and the title of the publication performance multiprocessor caches. Section 3 gives the specific aad its date appear, and notice is @en that copy;nP is by permission of organization of a V-R two-level cache hierarchy and its detailed the Association for Computing Machinery. To copy otherwise, or to operational description. Section 4 presents performance results republish, requires a fee and/or specific permission. from simulations, and conclusions are drawn in section 5. 0 1989 ACM 0884-7495/89/0000/0140$01.50 140 2 Design issues of two-level V-R caches no. of wr. per call count total writes 1 3 3 for high performance multiprocessors 2 2 4 3 0 0 4 2 8 This section addresses some important issues in the design of 5 2 10 two level V-R caches and motivates our design choices. A more detailed operational description of our approach is given in the following section. The proposed architecture for this evaluation is a shared-bus multiprocessor where each processor has a private, twolevel, V-cache-R-cache hierarchy as shown in Figure 1. RCdle R-C&e Table 1: Nu lmber of writes due to procedm re c:a116 V-Cache V-Cache EP . p Figure 1: Shared-bus organization Write policies 481 For a two-level cache, the write policy can be selected indepen- 9 735 10 and larger 3245 dently at each level. In the literature, write-through has been proposed as the most reasonable write policy for the first-level Table 2: Inter-write intervals (snapshot of 411,237 references) cache in a twolevel hierarchy, while write-back is advocated for the second level [lo, 8, 131. A major motivation for the choice The synonym problem of write-through at the first level is that cache coherence control As previously noted, a two-level V-R organization can be used is simplified. In this case, the first- and second-level caches will to solve the synonym problem. The solution requires the use always contain identical values. of a reverse translation table [15] for detecting synonyms, and a There are several problems, however, with using a first-level write- natural place to put that table is at the second level. through cache. First, assuming no write-allocate, write-through Our two-level organization permits and detects synonyms, but caches will have smaller hit ratios than write-back caches. Sec- guarantees that at most one copy of a data element exists in the ond, a write takes longer under write-through because the second- V-cache at any time. Each second-level cache block will have a level cache must be updated as well; primary memory may also pointer to its first-level child block, if one exists. If we guarantee need to be updated depending on the write policy for the second an inclusion property, where the R-cache contains a superset of level. the tags in the V-cache, the reverse translation information can The reduced write latency with write-through can be greatly hid- be stored in log(V-cache size/page size) superset bits in each R- den by the use of write buffers between the first and second levels, cache block. For each entry in the R-cache with a child in the but several write buffers may be needed. Table 1, for example, V-cache, these extra bits, together with the page offset, provide shows that in the execution of the VAX program pops (cf. sec- the V-cache location of its child. tion 4), 30% of writes are due to procedure calls, each of which When a miss occurs in the V-cache, the virtual address is trans- typically generates six or more successive writes. Table 2 shows lated (using a second-level translation buffer) and the R-cache is the inter-write interval distribution for a snapshot (411,237 refer- accessed. If an R-cache hit occurs, the R-cache checks whether ences) of the same trace using a 16K direct-mapped cache with a the data is also in the V-cache under another virtual address (a M-byte block size. As can be seen, the high percentage of short synonym). If so, it simply invalidates that V-cache copy and inter-write intervals confirms the need for several buffers. moves the data to the new virtual address in the V-cache. Thus, Unfortunately, while write buffers can reduce the write latency of while a data element can have synonyms, it is always stored the first-level cache, they re-introduce a complexity that Write- in the V-cache using the last virtual address with which it was through was intended to avoid, namely cache coherence. Write accessed.’ buffers can hold modified data for which other processors might ‘Note that our approach in dealing with the synonym problem has some encounter a miss. Thus, cache coherency control must be pro- similarities to Goodman’s approach [7].

Load more