Main Memory

Electrical and Engineering Stephen Kim ([email protected])

ECE/IUPUI RTOS & APPS 1

Main Memory n Background n Swapping n Contiguous allocation n Paging n Segmentation n Segmentation with paging

ECE/IUPUI RTOS & APPS 2

1 Background

n Program must be brought into memory and placed within a process for it to be run. n Input queue – collection of processes on the disk that are waiting to be brought into memory to run the program. n User programs go through several steps before being run. n From the memory-unit viewpoint, a CPU generate a stream of . u It doesn’t matter how they were generated or what they are for.

ECE/IUPUI RTOS & APPS 3

Binding of Instructions and Data to Memory

n Most systems allow a user program to reside in any part of the physical memory. u How can a user program know where its data are? n Address binding of instructions and data to memory addresses can happen at three different stages u Compile time : if memory location known a priori, absolute code can be generated; Must recompile code if starting location changes. u Load time: must generate relocatable code if memory location is not known at compile time. u Execution time : binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., Base and limit registers).

ECE/IUPUI RTOS & APPS 4

2 Multistep Processing of a User Program

ECE/IUPUI RTOS & APPS 5

Logical vs. Space n The concept of a logical address space that is bound to a separate physical address space is central to proper memory management. u Logical address – generated by the CPU; also referred to as virtual address. u Physical address – address seen by the memory unit. n Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme.

ECE/IUPUI RTOS & APPS 6

3 Memory-Management Unit (MMU)

n Hardware device that maps virtual to physical address.

n In MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory.

n The user program deals with logical addresses; it never sees the real physical addresses.

ECE/IUPUI RTOS & APPS 7

Dynamic relocation using a relocation register

ECE/IUPUI RTOS & APPS 8

4 Dynamic Loading

n Routine is not loaded until it is called n Better memory-space utilization; Unused routine is never loaded. u All routines are staying on disk in a relocatable load format. The main program is loaded into memory and is executed. When a routine needs to call another routine, the calling routine first check to see whether the other routine has been loaded. If not, the relocatable linking loader is called to load the desired routine into memory and to update the program’s address table to reflect this change. Then, control is passed to the newly loaded routine. n Useful when large amounts of code are needed to handle infrequently occurring cases. (eg. Error handling routines) n No special support from the is required implemented through program design.

u It is the responsibility of the user to design their program to take advantage of such a method.

ECE/IUPUI RTOS & APPS 9

Dynamic Linking

n Similar to dynamic loading. Linking postponed until execution time. n Small piece of code, stub, used to locate the appropriate memory-resident library routine. n Stub replaces itself with the address of the routine, and executes the routine. n All processes that use a language library execute only one copy of the library code. n Operating system needed to check if routine is in processes’ memory address. n Dynamic linking is particularly useful for libraries. u A library may be replaced by a new version, and all programs that references the library will automatically use the new version.

ECE/IUPUI RTOS & APPS 10

5 Overlays n To make a process larger than an allocated memory n Keep in memory only those instructions and data that are needed at any given time. n Implemented by user, no special support needed from operating system, programming design of overlay structure is complex n Example) u Pass1 70 KB, Pass2 80 KB, Symbol Table 20KB, Common routine 30KB u Without overlay, 200KB is required u The overlay itself needs some memory space, says 10KB

ECE/IUPUI RTOS & APPS 11

Overlays for a Two-Pass Assembler With the overlay, 140KB is enough

ECE/IUPUI RTOS & APPS 12

6 Swapping (1)

n A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution. n Backing store – fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images.

ECE/IUPUI RTOS & APPS 13

Swapping (2)

n Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped.

u For example, 1 MB process, 5MB / sec hard-disk, no head seeks, average latency 8 msec.

u The total swap time is ( swap-out and swap-in) 2 * (1MB / (5 MB/sec) + 8 msec)=416 msec

u If the system uses Round-robin, the time quantum must be much larger than 416 msec. n Modified versions of swapping are found on many systems, i.e., UNIX, , and Windows.

ECE/IUPUI RTOS & APPS 14

7 Schemes of Memory Allocation n Contiguous Allocation n Paging n Segmentation n Segmentation with Paging

ECE/IUPUI RTOS & APPS 15

Contiguous Allocation n Main memory usually into two partitions:

u Resident operating system, usually held in low memory

u User processes then held in high memory. n Why an OS in low memory?

u location of interrupt vector table n Single-partition allocation

u Relocation-register scheme used to protect user processes from each other, and from changing operating-system code and data.

u Relocation register contains value of smallest physical address; limit register contains range of logical addresses – each logical address must be less than the limit register.

ECE/IUPUI RTOS & APPS 16

8 CA, Hardware Support for Relocation and Limit Registers n MMU maps the logical address dynamically by adding the value in the relocation register.

Memory Management Unit limit relocation register register

logical physical address yes address CPU < + Memory

no trap: addressing error

ECE/IUPUI RTOS & APPS 17

CA, Memory Allocation

n OS maintains a table for allocated memory block and available memory block n Initially, all memory is available, so there is one large block of available memory, or one big hole n Holes of various size are scattered throughout memory.

u When a process arrives, it is allocated memory from a hole large enough to accommodate it.

u If a hole is large, it is split into two segments,

t one for allocating to a new process,

t the other hole is returned to the pool of holes.

u When a process terminates, it releases its block which is placed to the pool of holes.

ECE/IUPUI RTOS & APPS 18

9 CA, Dynamic Storage-Allocation Problem n How to satisfy a request from the set of available holes u Random-fit: choose an arbitrary hole among those large enough; not good utilization u First-fit: Allocate the first hole of the table that is big enough. Time complexity is O(N/2), where N is the number of holes. u Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. Produces the smallest leftover hole. Time complexity is O(N). u Worst-fit: Allocate the largest hole; must also search entire list. Produces the largest leftover hole. utilization is worse than FF & BF. u Next-fit: Like FF, but start the search at the hole where the last placement was made and treat the list of holes as a circular lis t. Time complexity O(N) n utilization = occupied_memory / total_memory

ECE/IUPUI RTOS & APPS 19

CA, Knuth Foundings n For most placement policies including BF&FF, the number of holes is, on average, approximately equal to one half the number of segments u N segments

u N/2 unusable holes n BF & FF had about the same utilization

ECE/IUPUI RTOS & APPS 20

10 CA, Fragmentation

n External Fragmentation– total memory space exists to satisfy a request, but there is no single hole to hold the request n Internal Fragmentation

u To solve the external fragmentation, the memory is partitioned into a fixed-size block.

u Then, allocate memory in unit of block size, but not in byte.

u Allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used.

ECE/IUPUI RTOS & APPS 21

CA, Compaction

n Reduce external fragmentation by compaction n Shuffle memory contents to place all free memory together in one large block. n Compaction is possible only if relocation is dynamic, and is done at execution time.

ECE/IUPUI RTOS & APPS 22

11 CA, Binary Buddy Systems, Method

n The system keeps the lists of holes, one list for each of the size 2m, 2m+1,… ,2n, where 2m is the smallest segment size. n A request to place a segment of size s k k-1 k u Round s upto the next power of 2, ie. 2 such that 2

t If not empty, place the segment in the first entry & remove that entry from the list. k k+1 t If there are no holes of size 2 , look in list for size 2 , etc until a hole is found. Then recursively divide that hole by 2 until the required size is got and use it. k+l t If we use a hole of size 2 , we will leave hole of size 2k+l-1, 2k+l-2,… ,2k

ECE/IUPUI RTOS & APPS 23

CA, BDS (example)

n Assume memory of size 1024, and memory request 68 u 68 is rounded to 128. u 1024 is partitioned into 512, 256, and 2x128 u one of size 128 is allocated for the request. n If we wished to place a segment of size 600, the buddy system rejects the request, even though the 3 neighboring holes can add upto more than 600.

512 256 128

ECE/IUPUI RTOS & APPS 24

12 CA, BDS (Recombination)

n When a segment departs, the system looks to see if its buddy (block of same size with same parent in binary tree) is vacant. If so, the 2 buddies are recombined, and the recombination process continues recursively up the tree until a non-vacant buddy is found.

2 holes cannot be recombined

ECE/IUPUI RTOS & APPS 25

CA, Fibonacci System

n Fibonacci sequence u 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, … n Memory utilization u Binary, about 65%

u Fibonacci, about 60%

89

55 34

34 21 21 12

ECE/IUPUI RTOS & APPS 26

13 CA, Weight System

n Power of 2 splits k k-2 k-2 u 2 = 2 + 3*2 p p p+1 u New power of 2 such as 3*2 = 2 + 2 n The weighted scheme has twice as many available size for a given size range as does the binary scheme n New entries are mid way between the binary scheme’s entry u Average internal fragmentation for segment is about ½ that of the binary schemes. 1024

768 256

256 512 192 64

384 128 128 64 ECE/IUPUI RTOS & APPS 27

CA, Fibonacci and Weighted Schemes n Both Fibnacci and weighted schemes improve internal fragmentation, but n degrade external fragmentation so much that the average utilization is worse than for the binary scheme

ECE/IUPUI RTOS & APPS 28

14 CA, Dual Buddy Systems n Keep 2 separate areas of memory k u one of size 2 , operating the binary buddy system p u the other of size 3*2 , also operating the binary buddy system n Results in all the same size as the weighted schemes, but

u reduce external fragmentation n Overall utilization is better than in the binary buddy system

1024 768

512 512 384 384

256 256 192 192

ECE/IUPUI RTOS & APPS 29

CA, Computing Mean Internal Fragmentation n Assumption

u buddy sizes : 8, 16, 32, 64, 128, and

u segment size: 10, 11, … , 100 are all equally likely to arrive

u P(s=10) = P(s=11) = P(s=100) = 1/91

u f(s) = fragmentation for segment size s n Mean internal fragmentation = S P(s)*f(s) 100 1 é 16 32 64 100 ù å P(s) f (s) = êå f (s) + å f (s) + å f (s) + å f (s)ú s =10 91 ës =10 s=17 s =33 s =65 û 1 é 16 32 64 100 ù = êå(16 -s) + å(32 -s) + å(64 -s) + å(128- s)ú 91 ës =10 s =17 s =33 s = 65 û 1 é 6 15 31 63 27 ù = êåi +åi +åi +åi -åiú 91 ë i= 0 i= 0 i= 0 i= 0 i= 0 û 1 é6´7 15´16 31´32 63´64 27´28ù = + + + - 91 ëê 2 2 2 2 2 ûú = 25 ECE/IUPUI RTOS & APPS 30

15 Paging n Allow the physical address space of a process to be noncontiguous n Divide physical memory into fixed-sized blocks called frames (size is power of 2. Why?) n Divide logical memory into blocks of same size - pages. n To run a program of size n pages, need to find n free frames n Set up a to translate logical to physical addresses. n Paging is usually required for , which introduces an unexpectable long-term delay. Hence, real- time systems mostly have no virtual memory. However, the technique used in paging is valuable in many system architecture in real-time system. (e.g. IP router, ATM switch)

ECE/IUPUI RTOS & APPS 31

Address Translation Scheme

n Address generated by CPU is divided into: u Page number (p) – used as an index into a page table which contains the base address of each page in physical memory.

u Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit. n The base address is combined with the page offset to define the physical memory address.

ECE/IUPUI RTOS & APPS 32

16 Address Translation Architecture

ECE/IUPUI RTOS & APPS 33

Paging Example

ECE/IUPUI RTOS & APPS 34

17 Paging Example

page 0 frame 0

page 1 frame 1

page 2 frame 2

page 3 frame 3

frame 4

frame 5

frame 6

frame 7

ECE/IUPUI RTOS & APPS 35

Internal fragmentation n Using the paging scheme, no external fragmentation, but n Some internal fragmentation. n e.g.) u Assume the page size is 2KB u Assume the size of a memory request is random. u What is the average internal fragmentation? n Small page size u is desirable due to internal fragmentation u cause overhead for page table for each process. n e.g.) 32-bit address and 4 byte for each entry in page table 21 u 2KB page size Þ11-bit offsetÞ2 entriesÞ8 MB page table 19 u 8KB page size Þ13-bit offsetÞ2 entriesÞ2 MB page table

ECE/IUPUI RTOS & APPS 36

18 Free Frames and Allocation

Before allocation After allocation

ECE/IUPUI RTOS & APPS 37

Implementation of Page Table

n Page table is stored in main memory due to its size. n Page-table base register (PTBR) points to the page table. n In this scheme every memory access requires two memory accesses.

u One for the page table and

u One for the memory itself. n The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or content address memory (CAM), called translation look-aside buffers (TLBs)

ECE/IUPUI RTOS & APPS 38

19 TLB n Associative memory – parallel search Page # (key) Frame # (value)

Address translation [A´, A´´] n Operation u When A´is in associative memory, it is compared with all key (page #) simultaneously. If found, get value (frame #) out. u If not found, get frame # from page table in memory and add the page # and frame # to TLB. u If TLB is full, OS selects a victim for replacement. u Who will be the victim? t LRU (Least Recently Used) or Random

ECE/IUPUI RTOS & APPS 39

Paging Hardware With TLB

ECE/IUPUI RTOS & APPS 40

20 TLB, ASID (Address Space ID) n to identify who (which process) generate the memory address n Several processes can share a TLB n On a context switch, OS need not flush all TLB entries

ECE/IUPUI RTOS & APPS 41

Effective Access Time n Associative Lookup = e time unit n Assume memory cycle time is 1 time unit n Hit ratio (r) – percentage of times that a page number is found in the associative registers; ration related to number of associative registers. n Effective Access Time

Teffective = (1 + e) r + (2 + e)(1 – r) = 2 + e – r

ECE/IUPUI RTOS & APPS 42

21 Memory Protection

n Memory protection implemented by associating protection bit with each frame.

n Valid-invalid bit attached to each entry in the page table: u “valid” indicates that the associated page is in the process’logical address space, and is thus a legal page.

u “invalid” indicates that the page is not in the process’ logical address space.

ECE/IUPUI RTOS & APPS 43

Valid (v) or Invalid (i) Bit In A Page Table

ECE/IUPUI RTOS & APPS 44

22 Page Table Structure n Most modern systems supports a large logical memory address (32-bit to 64-bit)

u The page table itself become excessively large

u e.g.) 32-bit system, 4KB page size, 1M page entries. Ass4ume 4- byte per entry, 4 MB for page table only. However, in most case, user programs do not need all logical memory space. 200 KB process, but 4 MB page table for the process !! n Solution: Organize the page table.

u Hierarchical Paging

u Hashed Page Tables

u Inverted Page Tables

ECE/IUPUI RTOS & APPS 45

Hierarchical Page Tables n Break up the logical address space into multiple page tables. n A simple technique is a two-level page table.

ECE/IUPUI RTOS & APPS 46

23 Two-Level Paging Example

n A logical address (on 32-bit machine with 4K page size) is divided into: u a page number consisting of 20 bits. u a page offset consisting of 12 bits. n Since the page table is paged, the page number is further divided into: u a 10-bit page number. u a 10-bit page offset.

n Thus, a logical address is page number page offset as follows: pi p2 d where pi is an index into the outer page table, and p2 is the displacement within the page of the outer page table. 10 10 12

ECE/IUPUI RTOS & APPS 47

Two-Level Page-Table Scheme

ECE/IUPUI RTOS & APPS 48

24 Address-Translation Scheme n Address-translation scheme for a two-level 32-bit paging architecture

ECE/IUPUI RTOS & APPS 49

High Level Page Table n Some system supports more than 2-level paging. n e.g. u SPARC supports 3-level paging

u 32-bit MC 68030 supports 4-level paging

ECE/IUPUI RTOS & APPS 50

25 Hashed Page Tables

n 64-bit systems need 7-level paging table in hierarcy structure NO WAY !! Too many memory access for address translation. n Hashed page table is for handling address space larger than 32 bits. n The virtual page number is hashed into a page table. This page table contains a chain of elements hashing to the same location. n Virtual page numbers are compared in this chain searching for a match. If a match is found, the corresponding physical frame is extracted.

ECE/IUPUI RTOS & APPS 51

Hashed Page Table

ECE/IUPUI RTOS & APPS 52

26 Inverted Page Table n Each process needs a page table, but all of them are for mapping to a single physical memory e.g.) Assume 100 processes, hence 100 page table. As a whole, all page tables are mapped to a physical memory. Many entries in page tables are just for wasting of memory. n Solution: Inverted Page Table u One entry for each real page of memory. u Hence, map from a physical address to a logical address. u Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. u Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs. u Use hash table to limit the search to one — or at most a few — page-table entries.

ECE/IUPUI RTOS & APPS 53

Inverted Page Table Architecture

ECE/IUPUI RTOS & APPS 54

27 Shared Pages

n One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems). u The code must be a reentrant code.

u Shared code must appear in same location in the logical address space of all processes.

u Similar to the sharing of address space of a task by threads (tasks).

u Difficult to make shared pages for the systems using inverted page table.

t because each physical frame has only one virtual page.

ECE/IUPUI RTOS & APPS 55

Shared Pages, Example

ECE/IUPUI RTOS & APPS 56

28 Segmentation n Memory-management scheme that supports user view of memory. n A program is a collection of segments. A segment is a logical unit such as: u main program, procedure, function, method, object, local variables, global variables, common block, stack, symbol table, arrays

ECE/IUPUI RTOS & APPS 57

Segmentation Hardware

ECE/IUPUI RTOS & APPS 58

29 User’s View of a Program

ECE/IUPUI RTOS & APPS 59

Logical View of Segmentation

1

4 1

2

3 2 4

3

user space physical memory space

ECE/IUPUI RTOS & APPS 60

30 Segmentation Architecture

n Logical address consists of a two tuple: , n Segment table – maps two-dimensional physical addresses; each table entry has: u base – contains the starting physical address where the segments reside in memory. u limit – specifies the length of the segment. n Segment-table base register (STBR) points to the segment table’s location in memory. n Segment-table length register (STLR) indicates number of segments used by a program; A segment number s is legal if s < STLR.

ECE/IUPUI RTOS & APPS 61

Example of Segmentation

ECE/IUPUI RTOS & APPS 62

31 Segmentation Architecture (Cont.) n Relocation. u dynamic u by segment table n Sharing. u shared segments u same segment number n Allocation. u first fit/best fit u external fragmentation

ECE/IUPUI RTOS & APPS 63

Segmentation Architecture (Cont.) n Protection. With each entry in segment table associate:

u validation bit = 0 Þ illegal segment u read/write/execute privileges n Protection bits associated with segments; code sharing occurs at segment level. n Since segments vary in length, memory allocation is a dynamic storage-allocation problem.

ECE/IUPUI RTOS & APPS 64

32 Sharing of Segments

ECE/IUPUI RTOS & APPS 65

Segmentation with Paging

n Combine paging and segmentation n Intel 386 uses segmentation with paging for memory management with a two-level paging scheme.

ECE/IUPUI RTOS & APPS 66

33 Notes

n Chapter 10 Virtual Memory is very critical in building an OS efficiently. n Many performance aspects of OS’es are measured by the virtual memory. n Even though, this class will not cover this chapter because most of real-time operating systems do not support the virtual memory, because u secondary storage is usually not an RTOS or embedded system component. (counterexample, ipod) u execution time cannot be expected with the virtual memory n But, I strongly recommend you to read the chapter by yourselves.

ECE/IUPUI RTOS & APPS 67

34