<<

SPEDE-2000 Lab Manual, CSU Sacramento 79

CChhaapptteerr 99.. AAddddrreessss TTrraannssllaattiioonn aanndd VViirrttuuaall MMeemmoorryy

You step in the But the water has moved on. Page not found. — Haiku error message

The Intel386 and later models include an on-chip paging memory-mapping unit (MMU). Paging occurs after the has been resolved to a linear address. If paging is enabled, the linear address will be translated into a page frame number and offset, run through the page tables, then sent to the unit of the CPU. This creates a virtual .

The first section describes how a logical address (selector and offset) is converted to a . It starts with an overview of the whole process, then explains the two steps in detail. The next page describes the paging system. When a page-fault occurs the offending linear address is stored in CR2. The chapter ends with information about setting up virtual address spaces. many O.S. textbooks describe the segmentation-paging system; you might want to also reference those texts.

The Address Conversion Process

All addresses on a CPU begin as logical addresses consisting of a selector and an offset. The CPU sends a physical address to memory. When paging is enabled, each address goes through two conversions. A protected-mode, OS uses both. First the selector is used to index into a descriptor table, usually the Global Descriptor Table (GDT). The descriptor provides a base address, which is added to the segment’s offset, provided by the original logical address. This sum is the linear address. The offset is compared against the segment’s limit value to ensure the offset is within bounds. See Figure 9-1 for a picture of this.

With paging enabled ( 31 in CR0 ), the linear address is really two values: a page frame number (PFN) and an offset with that page. The Intel Pentium uses a two-level scheme for page numbers, so the PFN is actually a directory index and a index. This scheme reduces the number of page tables required when there are “holes” (unmapped areas) in the address space. This way a very large “address space” can be supported with a small amount of physical RAM. Each “page entry” is 4 .

The descriptor table and the page tables are all located in system memory. To realize one memory access for a program, the CPU must actually read the descriptor from memory, a page directory, and a page table. So for every program memory access, the CPU must perform three additional accesses. This would really slow down any program. For that reason, the CPU caches as much information as it can onboard itself. In normal operation, only three or so selectors are used. When a selector register is first loaded, the CPU checks to make sure the descriptor is valid, and if so, loads its contents into the selector’s storage (these registers are hidden from the programmer). When the CPU performs the of the segments base address and the offset in the logical address, both values are already inside the CPU.

Even though each page table is 4K, the CPU doesn’t need to read the whole thing to translate a linear address. It needs only one page entry from the page directory (top level) and one entry from the page table (second level). A translation lookaside buffer (TLB) is used to cache these entries. It remembers recent page entries. Each time a new set of page tables is used (e.g., each address space has its own

SPEDE-2000 Lab Manual, CSU Sacramento 80

Logical Selector Offset Address

Dir Table Offset

+ Segment + Descriptor Page Entry

Linear Page Entry Physical Address Address

Global Descriptor Table (from GDTR)

Segmentation Paging

Figure 9-1: Overview of Segmentation and Paging set), this cache must be flushed (i.e., emptied). This is done automatically by the CPU when CR3 (page directory base register) is loaded. After the TLB is flushed (i.e., “cold”), the next few memory accesses will incur a lot of memory clock cycles.

♦ Segmentation Figure 9-2 below shows how a segmented address becomes a linear address. It takes a logical address and generates a linear address. Open arrows indicate base addresses. Each descriptor has four fields of primary interest. The first is the type information defining it as a code or segment. Second are the access (permission) , which if the whole segment can be written (if data) or executed (if code). Third is the base linear address of the segment, and lastly is the size or limit of the segment.

For 159, all the segments are setup with a base address of zero. This way all addresses point to the same place in the address space. The limit is set to 4GB, so that won’t get in your way. All this is done by the boot loader, before FLAMES runs.

The CPU register GDTR (global descriptor base register) supplies a base address and segment limit for the descriptor table. Using the selector’s upper 13 bits, a descriptor is selected and the limit and size fields are examined. If the limit is exceeded a will occur. This stops the memory and terminates the instructions, but the EIP register will point to the faulting instruction so it can be retired once the OS has recovered from the general protection fault. Note the LDTR holds a selector, not a pointer value. Its base and limit are from the descriptor is indicates.

There are a couple of places where an incorrect segment can be referenced. First, the descriptor index must be with the descriptor table. Bit 2 is the table indicator, and determines whether the GDT (zero) or the LDT (one) is used. The segment might also be accessed in an invalid manner, e.g., writing to a . All these conditions will generate a general protection fault.

SPEDE-2000 Lab Manual, CSU Sacramento 81

Really 13 Selector (16) Offset (32) Logical Offset bit index Address Index Into Limit (TI=0, so use GDT) Base Addr Segment Ref

Add Offset and Local Descriptor Segment’s Base Segment Address

Linear Code or Data Address Descriptor Compare Offset and Segment’s Limit

GDTR LDTR Offset >= Limit, then SegFault! Global and Local Descriptor Tables (8,192 entries each)

Figure 9-2. Logical to Linear Address Translation (first part)

♦ How the Page Tables Work This section describes how a linear address is translated through the page tables to generate a physical address. The page directory and all the page tables are stored in main memory. If paging is disabled, then the linear address is emitted from the CPU as the physical address.

If paging is enabled, the two-level page tables are referenced. As shown in Figure 9-3 below, the linear address is chopped into three fields (described next). Two of those fields index into page tables with 1024 page table entries (PTE). Each PTE contains a physical base address and some status bits. Twenty bits form the base address used in the next level down. The base address from the page table provides the upper 20 address bits of the frame. The CPU will cache portions of the tables in a Translation Look-aside Buffer (TLB). Thus, if it caches two entries, it can now access a 4K chunk of linear memory without having to read those parts again.

GENERATING A PHYSICAL ADDRESS This base address of the segment is added to the offset from the memory reference to generate a logical address. If paging is enabled, CPU’s memory interface unit (MIU) gets a chance to change this address. The linear address is split into three pieces. The top two fields are used as index values into the page tables for the current address space. The pages tables form a sparse, two-level, 1024-ary tree, anchored by the CPU’s CR3 (page directory base register) register.

The upper 10 bits are combined with CR3 to find the appropriate page directory. Address bits 31 to 22 index into the directory to get a page table pointer. Address bits 21 to 12 are used to select the page table entry with the frame’s base address. This base is combined to the lower 12 bits (page offset) to get

SPEDE-2000 Lab Manual, CSU Sacramento 82

Byte Offset Linear Address Index Into Limit msb lsb Base Addr Page Page Table Page Frame Segment Ref Directory Index Offset Index (10) (10) (12)

Combine Offset and Frame’s Base Address PDBR (CR3) Physical Address Page Directory Tables

Figure 9-3. Logical to Physical Address Translation (second part)

a physical address inside the page frame. Each index is 10 bits, so it can index 1024 different page entries. Each page entry is 4 bytes, therefore each page table is 4K bytes in size. This is also the size of a page frame!

When paging is enabled, the two-level page tables are referenced. The upper ten bits index into a page directory structure. Each page table entry (PTE) contains a physical base address and some status bits. Twenty bits form this physical base address, and they are combined with the lower twelve bits of the linear address (a perfect match) to finally generate the physical address. (The status bits are masked out when forming an address.) If either the page directory or PTE is marked not present, a page fault will occur. Register CR2 will contain the virtual address that caused the fault. (See Section 3-1 in Intel Architecture Developer’s Manual, volume 3 for an overview of this process.)

Use the VERR to verify a read through a ring 3 selector. You may have experienced the target computer spontaneously resetting itself. One cause of this is loading CR3 with a NULL value. Page frame 0 is not mapped, which causes the CPU to double-fault when generating an address. The CPU’s response is to shut itself down. The BIOS re-acts by either turning the computer off or rebooting the whole system.

PAGE ENTRY CONTROL BITS The status bits in each page table entry are important. They are described a few pages below. Each PTE is split into a 20 bit address and 12 bits of control. Intel has set aside bits 9, 10 and 11 for program usage; the hardware will not modify them. The most important bit is number 0, the “present” flag. When set to 0, it tells the MIU this base address in the PTE is not valid. For instance, the present flag is cleared on the first entry in the first page table. This affects the 4K range from 0K to 4K in the linear address map. It is used to catch references! (Now you know.) Another important flag is bit 1, which tells if the page frame can be written to (one) or is read-only (zero).

Selectors and Virtual Memory

It isn’t the selectors that give you virtual memory, it’s the page tables and the address translation they provide. The important piece of data from the selector’s descriptor is the base address in memory. This

SPEDE-2000 Lab Manual, CSU Sacramento 83 address, along with the offset in the memory access are added together. The sum is used to index into the page tables (described in more detail below).

♦ Segments Addresses which are output by the ‘core CPU’ in an Intel Architecture machine are virtual addresses in segmented form. These addresses are first fed to the Segmentation Unit (SU), which is separate from the CPU core but is internal to the processor chip (it is part of the circuitry called the “Address Translation Unit”). The SU translates the segmented address into a “linear” form – that is, an equivalent address which references (virtual) memory as a contiguous sequence of bytes with linearly increasing addresses.

The address translation (from segmented to linear form) performed by the SU is controlled by a set of “Segment Descriptors” contained in a “Descriptor Table” (either the “Global Descriptor Table (GDT)” or the “Local Descriptor Table (LDT)”) in main memory. There are many different Segment Descriptors – one for code references, one for data references, another for stack references, and so forth. These Segment Descriptors must be set up correctly in order for address translation to work properly.

The FLAMES startup code creates a set of Segment Descriptors in main memory – one for code (called the “Kernel Code Segment”), one for data (the “Kernel ”) and so forth. The startup code arranges that when a downloaded program starts running, the CPU can correctly access these Segment Descriptors (see Appendix B for details). The values in the Segment Descriptors are such that (virtual) memory appears to the Segmentation Unit translation hardware as one contiguous sequence of bytes. I.e., that every segment, regardless of type, has a starting (virtual) address of “zero” and is 4Gbytes long.

♦ Page Translation Linear (virtual) addresses which are output by the Segmentation Unit are fed into the “Paging Unit”. This piece of hardware uses a translation table to perform a mapping from a given virtual address to the corresponding physical . If the mapping is able to be successfully completed, the translated address is output to physical memory (RAM). If not, the Paging Unit generates a Page Fault interrupt (INT 14) instead.

The translation table is a hierarchical arrangement of translation values. At the top of the hierarchy is the Page Directory table. Entries in the Page Directory table point to Page Tables, each of which contains a translation value for a collection of individual virtual space pages. Both the Page Directory table and the individual Page Tables are stored in main memory; CPU Control Register CR3 contains the address of the base of the Page Directory table.

The Page Directory Table contains 1024 entries, each of which is 4 bytes. Each entry contains a pointer to the base of a Page Table, along with some attribute bits. Each Page Table, in turn, contains 1024 4- byte entries, each of which contains a translation value (frame address) of one 4K page of virtual space (again along with some attribute bits). It is these numbers – 1K of Page Directory entries × 1K of Page Table entries × 4K pages – which produce the fact that the size of Virtual Space is 4GB.

The following diagram shows the arrangement of the Page Directory and Page Tables, and how a virtual address is translated by the Paging Unit into a physical address.

SPEDE-2000 Lab Manual, CSU Sacramento 84

4 bytes 4 bytes First-level is 1023 the “page 1023 . directory” . . Each maps 4 . . megs. . 1 0 . . . Page Table Status bits . (bits 11..0) 3 2 1 CR3 0 (PDBR) 1023 . Page Directory . 1 0

Page Table 31 22 21 12 11 0 Page offset Directory Page Table Offset into never entry entry Page changed

Linear Address (from segmentation unit)

31 12 11 0

Frame Base Offset into Address Frame

Physical Address (to Memory)

Figure 9-4. Linear to Physical (Paging Tables)

The current address mapping is determined by the page directory pointer stored in the CPU’s CR3 register. Use the FLAMES command “CPU” to print out the value. The Page Directory entries and the Page Table entries have nearly identical formats; they differ in only two bit positions. The Page Directory entries have the following structure:

SPEDE-2000 Lab Manual, CSU Sacramento 85

Bits Meaning 31..12 Upper 20 bits of base address of Page Table 11..9 Available for OS use (not used by MMU hardware) 8 Global Page (leave 0) 7 Page Size (0 = 4K) 6 Reserved (0) 5 Accessed (1 = this Page Table has been accessed) 4 Cache Disabled (0) 3 Cache Policy (1= WriteThrough; 0=WriteBack) 2 User/Supervisor (0 = Supervisor and Page Table cannot be accessed in CPL3) 1 Read/Write (0 = Page Table is Read-Only) 0 Present (1=present)

Figure 9-5: Fields of a Page Directory Entry Page Table entries have the following structure:

Bits Meaning 31..12 Upper 20 bits of base address of Page (i.e., Frame Number) 11..9 Available for OS use (not used by MMU hardware) 8 Global Page (leave 0) 7 Page Size (0 = 4K) 6 Dirty (1 = Frame contents have been modified) 5 Accessed (1 = Frame has been accessed) 4 Cache Disabled (0) 3 Cache Policy (1= WriteThrough; 0=WriteBack) 2 User/Supervisor (0 = Supervisor and Page Table cannot be accessed in CPL3) 1 Read/Write (0 = Frame is Read-Only) 0 Present (1=present)

Figure 9-6: Fields of a Page Table Entry

The logical initial value for a Page Table entry, or a Page Directory entry, for an object which is present in memory is “base_addr | 0x7” – this indicates a page that is user-mode accessible, writable, and present. The FLAMES startup code creates an initial Page Directory with these values, then allocates a Page Table for each 4Mbyte block of installed physical memory and sets the Base Address value for each Page Table entry to be exactly the same as the corresponding physical frame base address. I.e., it creates what is called “straight-through” or “unity” mapping: every page maps to the frame of the same address. This is the default address translation in effect when a downloaded program starts running.

Caching policy can be set on a per-page basis. Normally, the policy should be cache enabled with write- back. If the page mapped I/O registers, you would disable the caching. A write-through means the CPU must do the write to main memory whenever the CPU commands it. Again, this is required when touching memory mapped I/O registers. However, it slows down the machine instruction execution rate. Using write-back, the cache circuits can queue up the memory change, transferring it out to main memory when it has time. Also, several writes to nearby memory locations can be batched together can executed as a single write operation.

SPEDE-2000 Lab Manual, CSU Sacramento 86

Using this technique of write-back, it is sometimes possible that memory will be updated in an order different than what the expects. (This can also be caused by out-of-order instruction execution.) This situation is called “weak memory ordering” and occurs on processors that are highly pipelined. If a page is not cached, then the BIU will write to memory in exactly the order commanded by the instructions. This is called “strong memory ordering.” Also, several instructions can force all pending writes to complete before the instruction begins to execute. The instructions are all I/O, IRET, LOCK, and moves to the control registers.

♦ Accessing a Page Table Entry There are two ways to access a page table entry in the page directory, and it depends on how you declare the PDBR value. You can either say it is an unsigned 32-bit integer or a pointer.

#include // For pte_t typedef #include pte_t get_pagedir_ptr( uint32 pdbr, void * virt_addr ) /* FIRST ATTEMPT */ { uint pagedir_index = PAGE_DIRECTORY_NR(virt_addr); pte_t * entryptr = (pte_t *) (pdbr + pagedir_index*sizeof(pte_t)); assert( 0 == (pdbr & PT_OFFSET_MASK) );

return *entryptr; } /* end get_pagedir_ptr() */

This function takes the process’s page-directory base pointer and a virtual address of some process. It looks into the page directory and returns an entry. From the return value, the caller should first check to make sure the entry is good by checking the valid bit (bit 0). If OK, then it can extract the upper 20 bits to form the base address of the page table.

The page-directory base pointer is added to an index value. Since each page-table entry is 4 bytes in size, we need to scale up the index value (this is done automatically for you by C). However, since the PDBR value is an integer, we must do the scaling ourselves (but see below). In both cases, we assert the lower 12 bits of the page-directory pointer are zero.

This next example shows the same function as above but the PDBR value is a pointer. This simplifies the code to find the page entry pointer. Since it is a pointer, C will do the index scaling for use.

#include // For pte_t typedef #include

pte_t get_pagedir_ptr( pte_t * pdbr, void * virt_addr ) /* SECOND ATTEMPT */ { assert( 0 == (PTR2INT(pdbr) & PT_OFFSET_MASK) ); { uint pagedir_index = PAGE_DIRECTORY_NR(virt_addr); pte_t entry = pdbr[pagedir_index] ;

return entry; } } /* end get_pagedir_ptr() */