Memory Management in Linux

Memory Management in Linux By: Rohan Garg 2002134 Gaurav Gupta 2002435 Architecture Independent Memory Model Process virtual address space divided into pages Page size given in PAGE_SIZE macro in asm/page.h (4K for x86 and 8K for Alpha) The pages are divided between 4 segments User Code, User Data, Kernel Code, Kernel Data In User mode, access only User Code and User Data But in Kernel mode, access also needed for User Data Addressing the page table Segment + Offset = 4 GB Linear address (32 bits) Of this, user space = 3 GB (defined by TASK_SIZE macro) and kernel space = 1GB Linear Address converted to physical address using 3 levels Index into Index into Index into Page Page Dir. Page Middle Page Table Offset Dir. Requesting and Releasing Page Frames order alloc_pages(gfp_mask, order) :- used to request 2 contiguous page frames. alloc_page(gfp_mask) :- returns the address of the descriptor of the allocated page frame. For only one page. __get_free_pages(gfp_mask, order) :- returns the linear address of the first allocated page. get_zeroed_page(gfp_mask) :- first invokes alloc_pages and then fills it with zeros. __get_dma_pages(gfp_mask, order) :- gets page frame suitable for DMA. GFP mask The flag specifies how to look for free page frames. E.g. GFP_WAIT :- kernel is allowed to block the current process waiting for free page frames. Freeing page frames __free_pages(page, order) :- If the count field of the descriptor is > 0, then decreases it by 1 else frees the 2order contiguous page frames. free_pages(addr, order) :- Frees the single page at address = addr. __free_page(page) :- Releases the page frame having page descriptor = page. free_page(addr) :- Releases the page frame having address = addr. Finding a Physical Page unsigned long __get_free_pages(int priority, unsigned long order, int dma) in mm/page_alloc.c Priority = GFP_BUFFER (free page returned only if available in physical memory) GFP_ATOMIC (return page if possible, do not interrupt current process) GFP_USER (current process can be interrupted) GFP_KERNEL (kernel can be interrupted) GFP_NOBUFFER (do not attempt to reduce buffer cache) order says give me 2^^order pages (max is 128KB) dma specifies that it is for DMA purposes Page descriptor Used to keep track of the current status of each page frame. Some of the key fields of the structure are described below: list:- contains pointers to next and previous items in a doubly linked list in a page descriptor. count:- usage reference counter for the page. A value greater than 0 implies more than one processor using the page frame. flags:- describe the status of the page frame. LRU:- contains pointers to the least recently used doubly linked list of pages. zone:- the zone to which the page frame belongs. Buddy System Algorithm Used for allocating groups of contiguous page frames and helps in solving the problem of external fragmentation. All free page frames are grouped into lists of blocks containing groups of 1, 2, 4, 8,….,512 contiguous page frames. If 128 contiguous page frames are required, list 128 is consulted. If not found, list 256 is consulted. If a block is found then the remaining 128 is added to the list 128. If not found list 512 is consulted and so on. Slab Allocator Runs over the basic “buddy heap algorithm”. It does not discard the ones allocated objects and saves them in memory, thus avoiding reinitialization. Created pools of memory areas of same type called caches. Caches are divided into slabs, each slab consisting of one or more contiguous page frames. Slab allocator never releases page frames of an empty slab unless kernel is looking for additional free page frames. Interface between slab allocator and buddy system. void *kmem_getpages(kmem_cache_t *cachep, unsinged long flags) { void *addr; flags |= cachep->gfpflags; addr = (void*) __get_free_pages(flags, cachep->gfporder); return addr; } Slab allocator invokes this function to call buddy system algorithm to obtain a group of free contiguous page frames. Similarly kmem_freepages() is used by the slab allocator to release a group of page frames. Process Address Space Kernel 0xC0000000 File name, Environment Arguments Stack _end bss _bss_start _edata Data _etext Code Header 0x84000000 Shared Libs Address Space Descriptor mm_struct defined in the process descriptor. (in linux/sched.h) This is duplicated if CLONE_VM is specified on forking. struct mm_struct { int count; // no. of processes sharing this descriptor pgd_t *pgd; //page directory ptr unsigned long start_code, end_code; unsigned long start_data, end_data; unsigned long start_brk, brk; unsigned long start_stack; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss; // no. of pages resident in memory unsigned long total_vm; // total # of bytes in this address space unsigned long locked_vm; // # of bytes locked in memory unsigned long def_flags; // status to use when mem regions are created struct vm_area_struct *mmap; // ptr to first region desc. struct vm_area_struct *mmap_avl; // faster search of region desc. } Memory Allocation for Kernel Segment Static Memory_start = console_init(memory_start, memory_end); Typically done for drivers to reserve areas, and for some other kernel components. Dynamic Void *kmalloc(size, priority), Void kfree (void *) Void *vmalloc(size), void *vmfree(void *) Kmalloc is used for physically contiguous pages while vmalloc does not necessarily allocate physically contiguous pages Memory allocated is not initialized (and is not paged out). kmalloc() data structures sizes[] 32 64 size_descriptor 128 page_descriptor 256 512 1024 2048 bh bh 4096 8192 bh bh 16384 32768 bh bh 65536 Null 131072 Null vmalloc() Allocated virtually contiguous pages, but they do not need to be physically contiguous. Uses __get_free_page() to allocate physical frames. Once all the required physical frames are found, the virtual addresses are created (and mappings set) at an unused part. The virtual address search (for unused parts) on x86 begins at the next address after physical memory on an 8 MB boundary. One (virtual) page is left free after each allocation for cushioning. vmalloc vs kmalloc Contiguous vs non-contiguous physical memory kmalloc is faster but less flexible vmalloc involves __get_free_page() and may need to block to find a free physical page DMA requires contiguous physical memory Paging All kernel segment pages are locked in memory (no swapping) User pages can be paged out: Complete block device Fixed length files in a file system First 4096 bytes are a bitmap indicating that space for that page is available for paging. At byte 4086, string “SWAP_SPACE” is stored. Hence, max swap of 4086*8-1 = 32687 pages = 130784KB per device or file MAX_SWAPFILES specifies number of swap files or devices Swap device is more efficient than swap file. Page Fault Error code written onto stack, and the VA is stored in register CR2 do_page_fault(struct pt_regs *regs, unsigned long error_code) is now called. If faulting address is in kernel segment, alarm messages are printed out and the process is terminated. If faulting address is not in a virtual memory area, check if VM_GROWSDOWN for the nexy virtual memory area is set (I.e. Stack). If so, expand VM. If error in expanding send SIGSEGV. If faulting address is in a virtual memory area, check if protection bits are OK. If not legal, send SIGSEGV. Else, call do_no_page() or do_wp_page(). Page Replacement Algorithm LRU – Least Recently used replacement NFU – Not Frequently Used replacement Page Ageing based replacement Working Set algorithm based on locality of references per process Working Set based clock algorithms LRU with Ageing and Working Set algorithms are efficient to use and are commonly used Page Replacement handling in Linux kernel Page Cache Pages are added to the Page cache for fast lookup. Page cache pages are hashed based on their address space and page index. Inode or disk block pages, shared pages and anonymous pages form the page cache. Swap cached pages also part of the page cache represent the swapped pages. Anonymous pages enter the swap cache at swap-out time and shared pages enter when they become dirty. LRU Cache LRU cache is made up of active lists and inactive lists. These lists are populated during page faults and when page cached pages are accessed or referenced. kswapd is the page out kernel thread that balances the LRU cache and trickles out pages based on an approximation to LRU algorithm. Active lists contains referenced pages. This list is monitored for Page references through refill_inactive Referenced pages are given a chance to age through Move To Front and unreferenced pages are moved to the inactive list The inactive lists contains the set of Inactive clean and inactive dirty pages. This set is monitored on a timely basis when pages_high threshold is reached for free pages on a per zone basis is crossed. Thank you.

Load more