External Cache for the RM5271

Application Note Introduction

Quantum Effect Devices, Inc. (QED) was founded in 1991 to design and develop MIPS RISC to MIPS Technologies, Inc. (MTI) specifications. MTI sub-licensed those designs to MIPS licensees. In 1996 QED obtained a license to manufacture and sale MIPS . QED is currently producing three 64-bit MIPS microprocessors that support external cache; the RM5270, RM5271, and RM7000. The , a QED designed and licensed product, also supports external cache. This paper is applicable to all four devices, but only the RM5271 will be directly reference herein.

This paper will discuss only the external cache, its implementation, and enhancements made to the secondary cache controller. For the RM5270, RM5271, and R5000 the external cache is a second level cache. For the RM7000 the external cache is a third level cache.

The RM5271 provides an on-chip controller for external second level cache. The second level cache is a unified direct- mapped block write-through cache with byte parity protection. It requires two types of pipelined synchronous burst SRAM, a Tag SRAM and a Data SRAM. The Tag SRAM is used to maintain a list of cache lines stored in the Data SRAM plus a single bit with each Tag to define whether the cache line that tag points to, or indexes, in the data SRAM contains valid data or not. The Data SRAM must have a parity bit for each byte since the system interface uses even parity to protect data transfers.

Not all transactions on the System Interface reference the external secondary cache. The external cache is not accessed for a double word, partial double word, word, or partial word read or write. The external cache is accessed only for memory space with the cache coherency of “Write Back”. Memory space with cache coherency of “Write Through With Allocate”, “Write Through Without Allocate” or “Uncached” do not make use of the external cache. Cache coherency and data movements are charted on the last page of this document.

Pipeline Burst SRAM verses Flow-through SRAM

Synchronous Burst (SyncBurst) SRAMs come in two flavors. Flow-through SRAM latches the address on one clock cycle and drives the data on the next clock cycle. Because of this timing they tend to be bandwidth limited. None of QED’s microprocessor can operate with Flow-Through Burst SRAM, so they will not be discussed further.

Pipeline Burst SRAM latches both the input and output data lines. For a read, on the first clock cycle following the latching of the address, the data is latched into the output latch. On the following cycle the data is driven out. This means that for a burst of 32-bytes, first data is fetched in three cycles, with each sequential double word fetched in a single cycle. This results in a cache line being read from memory in 3-1-1-1 clock cycles.

There are two types of Pipeline Burst SRAM – Single Cycle Deselect (SCD) and Dual Cycle Deselect (DCD). The R5000, being an early design, can only operate with Dual Cycle Deselect (DCD) SyncBurst SRAM. The RM5270, RM5271, and RM7000 can operate with either Dual Cycle Deselect (DCD) or Single Cycle Deselect (SCD) Synchronous Burst SRAM. Single Cycle Deselect (SCD) is the commodity SyncBurst SRAM. Mode bit 15 is set to a one to select SCD timing for SyncBurst SRAM.

To find out more about SyncBurst SRAM visit Micron Technology’s web site at www.micron.com. They have about half a dozen technical papers on Synchronous Burst SRAM.

Synchronous Burst SRAMs for the Data portion of the external cache are supplied by Micron Technology www.micron.com, (www.moto.com), GSI Technology (www.gigasemi.com), and Galvantech (www.gvti.com). Motorola (MCM69T618), Galvantech (GVT7164T18), and GSI Technology (GS84118T/B) supply tag SRAMs. There may be other suppliers of these two types of SRAMs.

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 To understand how to implement an external cache you first have to become familiar with the interactions between it and the RM5271.

Secondary Cache Transactions

The RM5271 performs only five operations on the external cache. Three of these operations are initiated by the CACHE instructions while the other two operations are block read and block write. The secondary cache is not accessed for non-block transfers.

For a read the RM5271 performs a sub-block ordered interleaved access. This allows critical data to be returned first (refer to section 11.20.2 of the RM5200 Family User Manual) allowing the RM5271 to resume execution when it receives the first double word of a requested block of data. For a block write the RM5271 performs a linear sequential access.

Reads

A block read takes place in external cache following an attempt to fetch instruction from the primary instruction cache or data from the primary data cache. When a read attempt fails to find the requested data in the primary caches, the RM5271 issues a speculative block read to both the external cache and the External Agent.

If the block is present in the external cache, (i.e. the cache read hits) the external cache Tag SRAM will assert ScMatch telling the CPU the requested data are found in the external cache and telling the External Agent to abort its read attempt. If the block is not present in the external cache, (i.e. the cache read misses) the ScMatch signal is not asserted. Seeing ScMatch negated at the third cycle, the CPU gives up on reading the data out of the external cache, and yields the system interface to the External Agent so that it can provide the requested data. The data fetched by the external agent will be written to both the external cache and the RM5271.

When the RM5271 requires instructions or data that is not in the primary caches, then it presents the physical address to the tag SRAM, the data SRAM, and to the External Agent. All three start a read cycle with this address and read command. If the required data or instruction is in the external data SRAM, then there will be a matching tag in the tag SRAM. This match within the tag SRAM will cause it to drive the ScMatch signal. This signal tells the external agent to cease its read efforts and tells the processor that the secondary cache will be supplying the requested data. When the external cache drives data onto the SysAD bus, it does not have the ability to generate a ValidIn* signal. Therefore, the external cache SRAMs must adhere to the timing requirements of the RM5271.

To reduce the time to read the 4 double words from external cache, the data SRAM drives the first of four double words on the same cycle that the Tag SRAM asserts ScMatch. A read that hits in external cache only takes 7 system clock cycles; one for the address, one to give bus ownership to the external cache, one for the first data double-word and ScMatch, 3 for the remaining 3 data double-words, and one to return bus ownership back to the RM5271.

If the required data or instruction is not present in the external cache, that is, the cache read misses, then the tag SRAM will not assert ScMatch two cycles after receiving the Index portion of the address. In this case the RM5271 gives bus ownership to the external agent, which continues its read. When the external agent has the data at its system interface, it will cause it to be written both into the data SRAM and the RM5271. The RM5271 will drive the signals necessary to write the corresponding new tag into the tag SRAM.

Issuance of the external cache read is controlled by the normal RdRdy* flow control mechanism. External cache read responses always proceed at the maximum data transfer rate. The External Agent read responses to the external cache proceed at the data rate generated by the External Agent.

Writes

The RM5271 uses a block write-through protocol to write data into the external cache. The RM5271 issues a block write operation that is directed to both the external cache and the external agent. On the same cycle that it drives the address, the processor also drives the cache line’s tag on SysAD[35:19] and the external valid bit, ScValid. This causes a valid Tag to be placed in the tag SRAM. The RM5271 then drives four double words of data onto the system interface and causes them to be latched into both external cache and the external agent.

Since external cache writes also go to the external agent, they proceed at the data transfer rate specified in the boot time mode bits (4:1) for write back data rates. Like non-block writes, their issuance is controlled by the normal WrRdy* flow control

2 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 mechanism. It is these writes going to both the external cache and the external agent that causes the external cache to be referred to as a block write through cache. That is, block writes go through the external cache to the external agent.

Block reads and block writes are two of five operations that can be transacted with the external cache. The other three operations - cache invalidate, cache clear and cache probe - involve only the RM5271 and the external Cache. Both the RM5271 and the External Agent jointly control cache read and cache write operations.

Invalidate

At start-up the contents of the data SRAM and tag SRAM are unknown. Before the external cache can be used it must be brought to a known state with all of its valid bits cleared. This function is called a cache invalidate operation.

There are three means to achieve this. The fastest and simplest is to execute a Flash Clear operation of the valid bit column. This is done using a CACHE instruction with bits [20:18] = 0 and bits [17:16] = 3 (Flash Invalidate). This CACHE instruction causes the ScCLR* signal pin to be asserted. As with everything simple, there is a price to pay. This operation can only be performed on tag SRAM that has a column clear function. At the time this article was written, no know tag SRAM supported this function.

The second means to invalidate the external cache is to clear the state bits throughout the tag SRAM. This also can be done in two ways. One is to use the CACHE instruction with bits [20:18] = 2 and bit [17:16] = 3 (Index Store Tag). This will store the CP0_TagLo register to the indexed tag line. To invalidate 2 Mbytes of external cache you would have to repeat this instruction 65,536 times. Therefore, this routine is generally not used at power-up, but instead by a cache handler routine, to flush no longer needed data from the external cache. Before employing this CACHE instruction zeros must be written to the the CP0_TagLo register.

The third, and faster, way to invalidate the external cache is with a CACHE instruction with bits [20:18] = 5 and bits [17:16] = 3 (Page Invalidate). This will store the CP0_TagLo register to 128 tags for each instruction execution. To invalidate 2 Mbytes of external cache you will have to repeat this instruction 512 times. One half Mbytes of external cache would only require 128 repetitions of this CACHE instruction. Before employing this CACHE instruction all zeros must be written to the CP0_TagLo register.

Probe

The final external cache transaction that involves only the external cache and the RM5271 is the cache probe. This cache operation is used by a cache handler to check for the existence of a particular cache line being present in the secondary cache. This is achieved with a CACHE instruction with bits [20:18] = 0 and bits [17:16] = 3. Also, the first three clock cycles of a normal block read of the external cache is a cache probe operation.

The External Agent’s role

The role of the external agent is to provide the required data or instruction when the requested cache line is not in the external cache. There are two reasons for the requested cache line not to be in external cache. The first is that the line simply has yet to be read out of main memory. In this case, the cache line read misses in the external cache, so the external agent fetches the required cache line from main memory and writes it to the external cache and the RM5271.

In the second case the required data does not exist in Write-Back memory space, but resides instead in Write-Through memory space. In this case the external cache is not accessed and the block read goes directly to the external agent. To detect this case the external agent has to monitor the ScTCE* pin. The ScTCE* pin is driven low along with ValidOut* during the address issue cycle to denote a block read that will access the external cache. For a block read that does not access the external cache, the ScTCE* pin is maintained high.

The ScTCE* pin will also be asserted for the address issue cycle of a block write that accesses the external cache and maintained de-asserted for a block write that does not access the external cache.

The external cache is accessed only via block reads and block writes. Non-block accesses never reference the external cache.

When a block read address is placed on the SysAD bus, both the external cache and the external agent start retrieving the cache line. The external cache determines two clock cycles later that it either contains the requested cache line or not. It indicates this by asserting or not asserting the ScMatch pin at the first data double word cycle. If the match failed, and the external cache does not assert ScMatch then the external agent continues with its cache line fetch. At this point the RM5271

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 3 408.565.0300 www.qedinc.com External Cache for the RM5271 gives up the ScWord[1:0] lines so that the external agent may control them during its write of the requested cache line. Once the write of the cache line by the external agent is complete, the system interface is returned to the RM5271.

Implementing External Cache

The RM5271 has a 64-bit multiplexed address and data bus (SysAD[63:0]), 8-bits of even parity (SysADC[7:0]), and a 9-bit command bus (SysCmd[8:0]). With these three buses and 6 control lines the RM5271 conducts all required transaction with an external agent.

To support external secondary cache the RM5271 provides the ScLine[15:0] bus, which is the index into both the data SRAM and the tag SRAM. That is, each unique address on the ScLine pins point to a cache line position within the data SRAM. The RM5271 also provides two address bits to point to doublewords within a cache line. These two lines are the ScWord[1:0] lines.

All accesses to the external cache are cache line in size, that is one 32-byte cache line of either instruction or data. The RM5271 can not do a double word, partial double word, word, partial word, or byte access to the Secondary Cache.

The RM5271 drives a physical address of 36-bits onto the SysAD[35:0] pins. For the secondary cache that address range is broken down into 4 segments. The upper 17-bits (SysAD3[35:19]), along with the ScValid bit, are used as cache line tags. The next lower 14-bits (SysAD[18:5], plus the two lower bits of the tag portion of the physical address (SysAD[20:19]), are re-driven as ScLine[15:0] and are used to index into the secondary cache. The next lower two bits (SysAD[4:3]) are re- driven as ScWord[1:0] and are used to address the four double words of a cache line. The lower 3 bits specify the individual bytes in an 8-byte double word and are always asserted as a zero for cache line accesses.

For each cache line stored in the data cache there has to be a corresponding cache tag in the Tag SRAM. A cache tag is that portion of an address line (SysAD[35:19]) that, when concatenated with the cache index lines (ScLine[15:0]) uniquely defines a cache line address, plus a single bit (ScValid) that specifies the validity of that cache line.

Physical Address (SysAD[35:0])

35 21 20 19 18 5 4 3 2 0 TAG Cache Line Index DW Byte 17 bits 16 bits 2 3

Cache Line TAG

35 19 PAddr[35:19] 17 bits

Cache Line Index

20 5 ScLine[15:0] 16 bits

Double-Word Index (ScWord[1:0])

4 3 DWI 2 bits

On the RM5271, as on the R5000, with secondary cache, the ScValid line is tied to the Tag SRAM as the 18th bit. When the RM5271 issues a block read, it also asserts ScValid high. This gets compared on the Tag SRAM in the same way as a tag index bits. If the tag index exist in the secondary cache, but is invalid the ScValid bit stored in tag RAM memory will be negated (low). Therefore, a probe will fail.

4 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 External Cache SRAM Requirement

The RM5271 stores one 18-bit tag for each 32-byte cache line stored in External Cache. The RM5271, with a 64-bit multiplexed address and data bus, gets 8-bytes for each double word bus fetch. This translates to 4 double-word bus cycles for one 32-byte cache line.

The way to determine the size of Tag RAM is to divide the size of the Secondary Cache by the size of the cache line. For 1/2 MB of Secondary Cache this is 512 K bytes divided by 32 bytes, which results in 16K x 18 RAM needed for the Tag memory. (1MB requires 32K x18, 2MB requires 64K x18, 4MB requires 128K x18, 8MB requires 256K x18)

Each entry in the Tag RAM points to a cache line position in the Data RAM. There is one Tag in the Tag RAM for each 32- byte cache line in the Data RAM. Perhaps the following table will help us understand how to partition the SysAD lines between the external secondary cache Tag RAM and Data RAM.

L2 Cache Size TAG SRAM bits Data SRAM bits TAG SRAM Size Data SRAM Size 512 KB 35:19 18:5 16K x 18 64K x 72 1 MB 35:20 19:5 32K x 18 128K x 72 2MB 35:21 20:5 64K x 18 256K x 72

If you are planning on using a 64K x 18 device for the Tag RAM, as it will allow use of 512K, 1M, or 2M external secondary cache, you can tie SysAD19 and SysAD20 to both the Tag and Data RAM without adverse effect. The thing to be cautious of here is that if you implement external cache of only 512 Kbytes, then you can not attach ScLine15 and ScLine14 to the Tag SRAM as this would put four images of the tags into the Tag SRAM. So for ScLine14 and ScLine15 you would connect them to the Tag SRAM via jumpers with pull-ups or pull-downs to hold the pins static. The jumpers will allow the external cache size to be determined at assembly.

The RM5271 has a 64-bit (8-byte) multiplexed address and data bus. The external cache size is selectable by the system designer to be 512 Kbytes, 1 Mbytes, or 2 Mbytes in size. To determine the size of external SRAM needed to implement a secondary cache, the system designer determines the size of secondary cache desired and divides that by 8 - the number of bytes in the SysAD bus width. Therefore, a 512 KB external cache requires a memory depth of 64K, with a width of 8 bytes. Due to the restrictions on loading of the SysAD bus these devices must be at least nKx18. A parity bit must be stored with each byte, because the RM5271 protects data transfers on the System Interface with even parity and the external cache has no way to generate parity.

A closer look at Tag SRAMs

A Tag SRAM is a SRAM with a compare circuit on chip so that a stored address fragment can be compared to a portion of the requested address and the validity of the corresponding cache line ascertained. The content of the Tag RAM is an array of Tags (upper address bits) stored in the SRAM plus a valid bit to denote cache line validity.

A Tag SRAM can be used as Data SRAM, but Data SRAM can not be used as Tag SRAM. The Tag SRAM differs from the Data SRAM in that it has a compare circuit built on chip. The way the Tag SRAM works is that the Tag portion of the address is latched onto the chip, but not stored into the memory array. At the same time that the TAG portion of the address is latched on-chip, the Tag SRAM, using the ScLines starts a read of the contents at that address. The Tag fetched from the memory array is not driven out, but is compared to the TAG portion previously latched. If these two are equal, then the cache line requested exist in the Data cache and ScMatch is asserted. If these two are not equal, then the cache line requested is not in the data cache and ScMatch is negated.

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 5 408.565.0300 www.qedinc.com External Cache for the RM5271

Write ScCWE* Device Enable ScTCE*

Data Latch SysAD[35:19], ScValid

Address Latch ScLine[15:0] Memory Array

Latch

Read Comparator ScTOE*

Gate GND Match ScMatch

The SysAD[35:19] pins and ScValid make up the cache tag and, thus, are the data input to the Tag SRAM. The ScLine[15:0] pins are the cache index and get attached to the Tag SRAM address pins. The ScTOE* pin, driven by the RM5271, controls the gating of data from the Tag SRAM onto the SysAD bus.

The ScTCE* pin, driven by the RM5271, is the Tag SRAM chip enable pin. When asserted this signal will cause either a probe or a write of the Tag RAM’s depending on the state of the Rag RAM’s write enable signal. If the write signal, ScCWE[1:0]* is negated when the chip enable signal, ScTCE* is asserted, then a probe will take place. If ScCWE[1:0]* is asserted when ScTCE* is asserted, then the contents of the input data register will be stored to the memory array. If ScCWE[1:0]*, ScTCE*, and ScTDE* are all three asserted simultaneously, then an error has been detected and the ScValid line will be driven de-asserted. If you are implementing an External Agent, ScTCE* should be monitored as it indicates that a secondary cache access is occurring.

The ScTDE* pin, when asserted by the RM5271, causes the value on the data inputs of the Tag RAM to be latched into the Tag RAM’s data-in register. The value is held there during the probe portion of a read operation. If a read probe fails and a refill of the Data RAM is necessary, this latched value will later be written into the Tag RAM memory array. Latching the Tag allows a shared address/data bus to be used without incurring a penalty to re-present the Tag during the refill sequence.

The ScCWE[1:0]* pins are the secondary cache write enable pin. When asserted low these pins cause a write to both the Tag and Data RAMs. Two pins are provided to balance the capacitive load relative to the remaining cache interface signals.

The ScMatch pin is an input of the Match pin from the Tag SRAM. The Match pin informs the RM5271 and the External Agent whether a read attempt hit in the data cache or not. On a miss the external agent proceeds with its fetch of the cache line and the RM5271 gives up the system interface. On a hit, the RM5271 proceeds with the reading of the cache line out of external cache and the external agent gives up on its fetch operation.

6 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 A closer look at Data SRAMs

Typical Pipeline Burst SRAM used by the RM5271 have an integrated x36 core with synchronous peripheral circuitry and a 2-bit burst address counter. By 36 PBSRAMs are required because the RM5271 protects it’s system interface data with even byte parity. Since the external cache has no way of generating parity, a bit of parity must be stored with each byte.

All PBSRAM synchronours input pass through registers that are latched on the rising edge of a clock input (CLK). The synchronous inputs include all addresses, all data inputs, active LOW chip (CE#), two additional chip enables for depth expansion (CE2, CE2#), burst control inputs (ADSC#, ADSP#, ADV#), byte write enables (BW1#, BW2#, BW3#, BW4#, BWE#) and global write (GW#).

PBSRAM asynchronous inputs include the output enable (OE#), clock (CLK) and snooze enable (ZZ). There is also a burst mode pin (MODE) that selects between interleaved and linear burst modes. The data-out (Q), enabled by OE#, is also synchronous. The PBSRAM supports write cycles from 1 to 4 bytes wide as controlled by the write control inputs. The RM5271 always writes a double-word to the external cache. Thus, 4 byte wide writes are the only version used.

Burst operation of PBSRAM can be initiated with either address status process (ADSP#) or address status controller (ADSC#) input pins. Subsequent burst addresses can be internally generated and controlled by the burst advance pin (ADV#). The type of burst addressing used - linear or interleaved - is selected by the MODE pin. Since the RM5271 does a linear write and an interleaved read, the PBSRAM is not used in burst mode. Instead the PBSRAM is setup to use an external address, then the RM5271 supplies a new address on each cycle for four clock cycles. To the RM5271 this is a burst transfer, but to the PBSRAM this is 4 single accesses on sucessive clock cycles.

The PBSRAM registers address and write control on-chip to simplify WRITE cycles. This allows self-timed WRITE cycles. Individual byte enables allow bytes to be written, but are not used since all access to the external cache is double-word in size. Instead the GW# is asserted low to causes all bytes to be written.

System Initialization

Mode bits

The RM5271, like it's MIPS predecessors, implements a serial mode bit stream. At boot-up 256 bits of serial data are shifted into the RM5271 using the ModeIn and ModeClock pins. These bits are used to configure the RM5271 to the system design configuration. They set such things as Burst Write Back Rate, SysClock to Pclock multiplier, I/O drive strength, et cetera.

There are several mode bits that affect the Secondary Cache. Mode bit 12 is inverted and loaded into the Config register at bit 17 (Config.SC). This bit (Config.SC) being low informs the operating system that there is secondary cache attached. The operating system, after reading config register bit 17 as low, and wanting to utilize the external cache, would set config register bit 12 (Config.SE). This enables the secondary cache interface pins. At power-on config register Secondary Cache Enable bit (Config.SE) is set to a zero, holding the RM5271 Secondary Cache output pins in tri-state.

If the external cache is not going to be enabled, the system designer does not need to drive the external cache input pins ScMatch and ScDOE* to a know state. The RM5271 has weak pull-up on these two pins when external cache is not enabled.

Another mode bit of significance to the external cache is bit 15. When this bit is a zero, ScDCE* pin is negated at t5 time so as to operate with Dual Cycle Deselect (DCD) pipelined burst SRAM. If this bit to a one, ScDCE* is negated at t6 time, allowing the RM5200 and RM7000 to operate with Single Cycle Deselect pipelined Sync Burst SRAMs. This may seem like a minor difference, but it allows for the use of lower cost commodity SRAM in the Secondary Cache.

On a SRAM data sheet the nomenclature to look for is something that says "Single-Cycle Deselect Timing", or "Single-Cycle Disable" or "Pentium(TM) BSRAM-compatible" or something similar.

Unfortunately the R5000 was developed before the Single Cycle Deselect BSRAM were specified, so it does not implement the single cycle deselect mode bit.

For the R5000, mode bits 16 and 17 are designated to specify the size of external cache. The state of these two bits are placed into the config register at bits 20 and 21, respectively. These bits have no affect on the hardware of the R5000, RM5200 or

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 7 408.565.0300 www.qedinc.com External Cache for the RM5271 RM7000. Therefore, for the RM5200 and the RM7000 these bits are referred to as System Configuration Identifiers. They can be set to reflect any system configuration that the system designer wants.

However, if you were going to design a RM5271 into an existing R5000 product, you could retain the existing settings for mode bits 16 and 17 so as to maintain software compatibility. If on the other hand you were designing a system from scratch, and planning on eventually using the RM7000, with it's ability to support up to 8 Mbytes of external cache, you would want to re-assign these bits so that they span the external cache range you anticipate using on the low end product with the RM5271 and on the high end product with the RM7000.

Conclusion

Implementing an external cache is not difficult once its operations are understood. External cache can yield over two times the performance of a similar system without one. Buying a microprocessor that could yield such improvement without external cache would be cost prohibited.

The RM5271, with the Single Cycle Deselect (SCD) mode bit, offers a significant improvement over the R5000. The SCD mode bit allows the external cache to be implemented with lower cost commodity Synchronous SRAMS.

8 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 Tying it all together

Connections for 2MB external cache using x18 SRAMS

SysCmd[8:0] SysADC[7:0] SysAD[63:0] ScLine[15:0] ScWord[1:0] ScDCE* ScCWE* External RM5271 Agent ScDOE*

ScTOE* ScTCE* ScTDE* ScMatch ScValid SysADC[7:0] SysAD[63:0] ScLine[15:0] ScWord[1:0] ScDCE* ScCWE* ScDOE* SysAD[35:19] ScLine[15:0] ScWord[1:0] ScCWE* ScTCE* ScTOE* ScTDE* ScMatch ScValid

Data Tag SRAM SRAM

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 9 408.565.0300 www.qedinc.com External Cache for the RM5271

SRAM Pins TAG Data[71:54] Data[53:36] Data[35:18] Data[17:0] DQ18 ScValid SysADC7 SysADC5 SysADC3 SysADC1 DQ[17:10] SysAD[35:28] SysAD[63:56] SysAD[47:40] SysAD[31:24] SysAD[15:8] DQ9 SysAD27 SysADC6 SysADC4 SysADC2 SysADC0 DQ[8:1] SysAD[26:19] SysAD[55:48] SysAD[39:32] SysAD[23:16] SysAD[7:0] SA[15:2] ScLine[13:2] ScLine[13:0] ScLine[13:0] ScLine[13:0] ScLine[13:0] SA1 ScLine1 ScWord1 ScWord1 ScWord1 ScWord1 SA0 ScLine0 ScWord0 ScWord0 ScWord0 ScWord0 OE* / G* ScTOE* ScDOE* ScDOE* ScDOE* ScDOE* CLK SysClock SysClock SysClock SysClock SysClock CE* ScTCE0* ScDCE0* ScDCE0* ScDCE1* ScDCE1* CE2 Pull-Up Pull-Up Pull-Up Pull-Up Pull-Up CE2* Pull-Down Pull-Down Pull-Down Pull-Down GW ScCWE0* ScCWE0* ScCWE0* ScCWE1* ScCWE1* BW1* Pull-Up Pull-Up Pull-Up Pull-Up BW2* Pull-Up Pull-Up Pull-Up Pull-Up BW3* Pull-Up Pull-Up Pull-Up Pull-Up BW4* Pull-Up Pull-Up Pull-Up Pull-Up ADSP* Pull-Up Pull-Up Pull-Up Pull-Up ADSC* Pull-Down Pull-Down Pull-Down Pull-Down ADV* Pull-Up Pull-Up Pull-Up Pull-Up Match ScMatch MG* Pull-Down

Connections for 512 KB external cache using x18 SRAMs.

10 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 Making sure it works

/****************************************************************************** * * Title: ecache_test.s * Author: David Y. Lau * Date: march 4, 1999 * ****************************************************************************** * * Description: * Simple test to check bits in external cache. * Only meant as an example * * Compile options: * * * Special notes: * MIPS III * Expects primary cache to be 2-way set associative * Expects each primary set to be 16KB each (32KB total) * Needs to run in kernel mode as it uses kseg0 addresses * * Copyright 1999 by QED, Inc. *****************************************************************************/

/* * include files: */ #include "proj.s" include(DIAG_ROOT`/include/template.m4')

#include "cp0_r4k.h" #include "regdef.h" #include "standard.h" #include "sys/reg.h" #include "sys/asm.h"

.text .set noat .set noreorder

#define TAGLO_PSTATE 0xc0 #define TAGLO_PTAG 0xffffff00 #define ADDR_MASK 0x1FFFF000 #define PTAG_ADDR_SHIFT 4 #define P_SET_SIZE 0x4000 /* 16KB per sets */ #define CACHELINE_SIZE 0x20 /* 32 bytes */

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 11 408.565.0300 www.qedinc.com External Cache for the RM5271

define(CHECK_NOT_IN_PRIMARY,` cache Index_Load_Tag_D,0($1) /* look in one data cache set */ mfc0 r7,C0_TAGLO li r8,TAGLO_PSTATE and r9,r7,r8 / *Extract state info*/ beq r9,r0,2f /* if not valid move on */ nop li r8,TAGLO_PTAG /* Extract tag info */ and r9,r7,r8 sll r9,r9,PTAG_ADDR_SHIFT and r7,$1,ADDR_MASK beq r7,r9,Failure /* compare tag bits */ nop 2: cache Index_Load_Tag_D,P_SET_SIZE($1) /* look in other dcache set */ mfc0 r7,C0_TAGLO li r8,TAGLO_PSTATE and r9,r7,r8 /* Extract state info*/ beq r9,r0,2f /* if not valid move on */ nop li r8,TAGLO_PTAG /* Extract tag info */ and r9,r7,r8 sll r9,r9,PTAG_ADDR_SHIFT and r7,$1,ADDR_MASK beq r7,r9,Failure /* compare tag bits */ nop 2: nop ')

#define TAGLO_SSTATE 0x1c00 #define TAGLO_STAG 0xffff8000 #define ADDR_SMASK 0x1FF80000 #define STAG_ADDR_SHIFT 4

define(CHECK_IN_SECONDARY,` cache Index_Load_Tag_S,0($1) /* look in the secondary */ mfc0 r7,C0_TAGLO li r8,TAGLO_SSTATE and r9,r7,r8 /* Extract state info*/ beq r9,r0,Failure nop li r8,TAGLO_STAG /* Extract tag info */ and r9,r7,r8 sll r9,r9,STAG_ADDR_SHIFT and r7,$1,ADDR_SMASK bne r7,r9,Failure /* compare tag bits */ nop

12 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 ')

/***************************************/ /* Make sure External Cache is enabled */ /***************************************/

NESTED(ENTRY,0,zero)

mfc0 r1,C0_CONFIG li r2,~7 /* clear kseg0 cache coherency bits */ and r1,r1,r2 ori r1,r1,0x1003 /* set writeback cache coherency */ /* and enabled external cache */ mtc0 r1,C0_CONFIG nop nop nop nop nop nop nop nop

#define Index_Store_Tag_S 0xb #define Index_Load_Tag_S 0x7 #define Index_Load_Tag_D 0x5 #define TAGLO_EC_MASK 0x400

#define PATTERN 0xa5a5a5a5 /* modify this for your pattern */ #define STARTADDRESS 0x80040000 /* modify for your systems design */ /* it should be some kseg0 address */

#define SIZE_OF_EC 0x80000 /* modify this for your EC size */ /* here EC is ½ MB */

/*******************************************/ /* Check we can write the valid bit in tag */ /*******************************************/

li r1,PATTERN li r2,STARTADDRESS li r20,STARTADDRESS + SIZE_OF_EC li r4,TAGLO_EC_MASK loop: mtc0 r1,C0_TAGLO cache Index_Store_Tag_S,0(r2) /* write the tag */ nop nop

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 13 408.565.0300 www.qedinc.com External Cache for the RM5271 nop /* to test retention */ nop /* put a delay here */ nop nop cache Index_Load_Tag_S,0(r2) /* read the tag */ mfc0 r3,C0_TAGLO and r3,r3,r4 /* mask out bits that */ and r5,r1,r4 /* can't be written */

bne r5,r3,Failure /* check expected values */ nop bne r2,r20,loop addi r2,r2,CACHELINE_SIZE /* goto next cache line */

/**************************************/ /* Let's test the L2 cache itself now */ /**************************************/

#define Hit_Writeback_Inv_D 0x15

#define DWORD_PATTERN 0xfefefefefefefefe /* modify this for your pattern */

dli r1,DWORD_PATTERN li r2,STARTADDRESS /* cache line aligned KSEG0 address */ li r20,STARTADDRESS + SIZE_OF_EC

loop2: sd r1,0(r2) /* Store Pattern */ sd r1,8(r2) sd r1,16(r2) sd r1,24(r2) /* write a complete cache line */

cache Hit_Writeback_Inv_D,0(r2) /* kick data out of primary */ nop /* into secondary */ nop nop nop /* to test retention */ nop /* put a delay here */ nop nop nop CHECK_NOT_IN_PRIMARY(r2) /* if you're paranoid */ CHECK_IN_SECONDARY(r2) /* if you're paranoid */ ld r10,0(r2) /* fetch 4 double words from EC */ ld r11,8(r2) ld r12,16(r2) ld r13,24(r2)

14 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com External Cache for the RM5271 bne r10,r1,Failure /* check expected data */ nop bne r11,r1,Failure nop bne r12,r1,Failure nop bne r13,r1,Failure nop bne r2,r20,loop2 addi r2,CACHELINE_SIZE /* goto next cache line */

halt(PASS)

Failure: fail: halt(FAIL) END(ENTRY)

QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 15 408.565.0300 www.qedinc.com External Cache for the RM5271

Cache Coherency Data movement Sys. I'face Transactions Load Hit Data fetched from primary data cache None Write Through Load Miss Cache line fetched from main memory. Block Read No Write Allocate Store Hit Store DW or less to primary and main memory Non-Block Write Store Miss Store DW or less to main memory Non-Block Write Load Hit Data fetched from primary data cache None Load Miss Cache line fetched from main memory Block Read Write Through Store DW or less to primary cache and main with Write Store Hit Non-Block Write memory Allocate Cache line fetched from main memory, Store Block Read, Non-Block Store Miss DW or less to primary cache and main memory. Write Load Hit Data fetched from primary data cache None Cache line fetched from secondary cache or Block Read, followed by main memory. If fetched from main memory, Load Miss Block Write if displacement the cache line is also written into secondary takes place. cache Data written to primary data cache. Data is returned to main memory only when line is Write back Store Hit flushed from cache, either by a read that None displaces this cache location, or by a writeback cache instruction. Cache line fetched from secondary cache or main memory, store DW or less to data cache. Block Read, followed by Data is returned to main memory only when line Store Miss Block Write if displacement is flushed from cache, either by a read that takes place. displaces this cache location, or by a writeback cache instruction.

This document may, wholly or partially, be subject to change without notice. Quantum Effect Devices, Inc. reserves the right to make changes to its products or specifications at any time without notice, in order to improve design or performance and to supply the best possible product. All rights are reserved. No one is permitted to reproduce or duplicate, in any form, the whole or part of this document without QED’s permission. QED will not be held responsible for any damage to the user or any property that may result from accidents, misuse, or any other causes arising during operation of the user’s unit. LIFE SUPPORT POLICY: QED’s products are not designed, intended, or authorized for use as components intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which failure of the product could create a situation where personal injury or death may occur. Should a customer purchase or use the products for any such unintended or unauthorized application, the customer shall indemnify and hold QED and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that QED was negligent regarding the design or manufacture of the part. QED does not assume any responsibility for use of any circuitry described other than the circuitry embodied in a QED product. The company makes no representations that the circuitry described herein is free from patent infringement or other rights of third parties, which may result from its use. No license is granted by implication or otherwise under any patent, patent rights, or other rights, of QED. The QED logo and RISCMark are trademarks of Quantum Effect Devices, Inc. MIPS is a registered trademark of MIPS Technologies, Inc. All other trademarks are the respective property of the trademark holders. Laurin McLaurin RM5271-AN1161010002

16 QUANTUM EFFECT DEVICES, INC., 2500-5 AUGUSTINE DRIVE, SUITE 200, SANTA CLARA, CA 95054 408.565.0300 www.qedinc.com