Linux 2.6 Memory Management

L i n u x 2 . 6 Me mo r y Ma n a ge me n t Joseph Garvin Wh y d o w e c a r e ? Without keeping multiple process in memory at once, we loose all the hard work we just did on scheduling. Multiple processes have to be kept in memory in order for scheduling to be effective. Z o n e s ZONE_DMA (0-16MB) Pages capable of undergoing DMA ZONE_NORMAL (16-896MB) Normal pages ZONE_HIGHMEM (896MB+) Pages not permanently mapped to kernel address space Z o n e s ( c o n t . ) ZONE_DMA needed because hardware can only perform DMA on certain addresses. But why is ZONE_HIGHMEM needed? Z o n e s ( c o n t . 2 ) That's a Complicated Question 32-bit x86 can address up to 4GB of logical address space. Kernel cuts a deal with user processes: I get 1GB of space, you get 3GB (hardcoded) The kernel can only manipulate memory mapped into its address space Z o n e s ( c o n t . 3 ) 128MB of logical address space automatically set aside for the kernel code itself 1024MB – 128MB = 896MB If you have > 896MB of physical memory, you need to compile your kernel with high memory support Z o n e s ( c o n t . 4 ) What does it actually do? On the fly unmaps memory from ZONE_NORMAL in kernel address space and exchanges it with memory in ZONE_HIGHMEM Enabling high memory has a small performance hit S e gme n t a t i o n -Segmentation is old school -The kernel tries to avoid using it, because it's not very portable -The main reason the kernel does make use of it is for compatibility with architectures that need it. S e gme n t a t i o n ( c on t . ) Kernel makes 6 segments on Pentium: 1. A segment for kernel code 2. A segment for kernel data 3. A segment for user code 4. A segment for user data 5. A task-state segment (TSS) 6. A default local-descriptor-table (LDT) segment S e gme n t a t i o n ( c on t . 2 ) But it doesn't actually matter! They're all set to the same address range -- the largest possible: 0x00000000-0xffffffff. Again, segmentation use is for compatibility Ho w t o A l l o c a t e K e r n e l Me mo r y struct page * alloc_pages(unsigned int gfp_mask, unsigned int order) Allocates 2^n pages and returns a pointer to the first one. All of the other kernel memory allocation mechanisms are built on this function. Ho w t o A l l o c a t e K e r n e l Me mo r y void * page_address(struct page *page) I'll give you 3 guesses. Ho w t o A l l o c a t e K e r n e l Me mo r y unsigned long __get_free_pages(unsigned int gfp_mask, unsigned int order) Like alloc_pages but returns the logical address of the first requested page. Ho w t o A l l o c a t e K e r n e l Me mo r y unsigned long get_zeroed_page(unsigned int gfp_mask) This clears the page with zeroes before giving it to us. Why might this be a good idea? Ho w t o A l l o c a t e K e r n e l Me mo r y Matching free functions: void __free_pages(struct page *page, unsigned int order) void free_pages(unsigned long addr, unsigned int order) void free_page(unsigned long addr) Ho w t o A l l o c a t e K e r n e l Me mo r y void * kmalloc(size_t size, int flags) Most of the time in the kernel kmalloc is used. kmalloc is like malloc -- it allocates in bytes. Can take special flags to indicate how the allocation should be performed -- GFP_ATOMIC tells kmalloc it's not allowed to sleep. Why might that be useful? Ho w t o A l l o c a t e K e r n e l Me mo r y void * vmalloc(unsigned long size) kmalloc always allocates _physically continuous memory_. vmalloc can be used to allocate memory through the virtual memory system. Why not always vmalloc? Ho w t o A l l o c a t e K e r n e l Me mo r y Matching free functions: void kfree(const void *ptr) void vfree(void *addr) S l a b A l l o c a t o r This is the kernel. We need to be ludicrously efficient. We just spent a bunch of time in class discussing how immensely clever page tables are. But that will not prevent us from throwing them to the wind. S l a b A l l o c a t o r Page tables will suffice for user level code but kernel level code needs to be more space and speed efficient. 1. Create a cache for a type of object, say task_struct 2. Allocate pages to store task_structs in (slabs) 3. Tightly pack task_structs inside the pages and reuse them S l a b A l l o c a t o r Benefits: 1. Tightly packed structures take up less space. 2. By reusing already allocated memory we avoid costly allocations S l a b A l l o c a t o r Lets look at how we make a cache. Look how scary this is: kmem_cache_t * kmem_cache_create(const char *name, size_t size, size_t align, unsigned long flags, void (*ctor)(void*, kmem_cache_t *, unsigned long), void (*dtor)(void*, kmem_cache_t *, unsigned long)); A little context may help... kernel/fork.c T h e K e r n e l i s S p e c i a l In the kernel there are a lot more heap allocations than in user level code. Why might that be? T h e K e r n e l i s S p e c i a l Because the kernel only gets 1-2 pages of stack space! On x86 that's 8KB. Why so small? "When each process is given a small, fixed stack, memory consumption is minimized and the kernel need not burden itself with stack management code." - Robert Love *This is why Reiser4 crashes with 4KB stacks; its call chain is too big P a ge T a b l e s The kernel uses a 3 level page table. *On 32-bit level architectures the middle page table is simply ignored (set to 1) *This is “good enough” because 64-bit architectures throw out 21-bits of addressing power P a ge T a b l e s How To Get a Frame: pmd = pmd_offset(pgd, address); pte = *pte_offset_map(pmd, address); page = pte_page(pte); (PGD) = Layer 1 = “Page Global Directory” (PMD) = Layer 2 = “Page Middle Directory” (PTE) = Layer 3 = “Page Table Entry” S o u r c e s Linux Kernel Development 2nd Edition by Robert Love (Chapters 11, 14, and 15) Linux Memory Management Wiki: http://linux-mm.org/ kerneltrap.org article Feature: High Memory In The Linux Kernel http://kerneltrap.org/node/2450 Freenode.net #linux #kernel Explore the Linux memory model http://www-128.ibm.com/developerworks/linux/library/l-memmod/index.html Kernel comparison: Improved memory management in the 2.6 kernel http://www-128.ibm.com/developerworks/linux/library/l-mem26/ x86-64 has support 4 level page tables. Experimental kernel patches available for this. http://lwn.net/Articles/106177/.

Linux 2.6 Memory Management

Better Performance Through a Disk/Persistent-RAM Hybrid Design

Virtual Memory

Memory Protection at Option

Chapter 3. Booting Operating Systems

ELF1 7D Virtual Memory

HALO: Post-Link Heap-Layout Optimisation

Isolation, Resource Management, and Sharing in Java

FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems

Chapter 4 Memory Management and Virtual Memory Asst.Prof.Dr

Improving the Performance of Hybrid Main Memory Through System Aware Management of Heterogeneous Resources

Memory Management

A Hybrid Swapping Scheme Based on Per-Process Reclaim for Performance Improvement of Android Smartphones (August 2018)