L i n u x 2 . 6 Me mo r y Ma n a ge me n t

Joseph Garvin Wh y d o w e a r e ?

Without keeping multiple in memory at once, we loose all the hard work we just did on . Multiple processes have to be kept in memory in order for scheduling to be effective. Z o n e s

ZONE_DMA (0-16MB) Pages capable of undergoing DMA

ZONE_NORMAL (16-896MB) Normal pages

ZONE_HIGHMEM (896MB+) Pages not permanently mapped to kernel address space Z o n e s ( c o n t . )

ZONE_DMA needed because hardware can only perform DMA on certain addresses.

But why is ZONE_HIGHMEM needed? Z o n e s ( c o n t . 2 )

That's a Complicated Question

32-bit can address up to 4GB of logical address space.

Kernel cuts a deal with processes: I get 1GB of space, you get 3GB (hardcoded)

The kernel can only manipulate memory mapped into its address space Z o n e s ( c o n t . 3 )

128MB of logical address space automatically set aside for the kernel code itself

1024MB – 128MB = 896MB

If you have > 896MB of physical memory, you need to compile your kernel with high memory support Z o n e s ( c o n t . 4 )

What does it actually do?

On the fly unmaps memory from ZONE_NORMAL in kernel address space and exchanges it with memory in ZONE_HIGHMEM

Enabling high memory has a small performance hit S e gme n t a t i o n

-Segmentation is old school

-The kernel tries to avoid using it, because it's not very portable

-The main reason the kernel does make use of it is for compatibility with architectures that need it. S e gme n t a t i o n ( c on t . )

Kernel makes 6 segments on Pentium: 1. A segment for kernel code 2. A segment for kernel data 3. A segment for user code 4. A segment for user data 5. A task-state segment (TSS) 6. A default local-descriptor-table (LDT) segment S e gme n t a t i o n ( c on t . 2 )

But it doesn't actually matter!

They're all set to the same address range -- the largest possible: 0x00000000-0xffffffff.

Again, segmentation use is for compatibility Ho w t o A l l o c a t e K e r n e l Me mo r y

struct page * alloc_pages(unsigned int gfp_mask, unsigned int order)

Allocates 2^n pages and returns a pointer to the first one. All of the other kernel memory allocation mechanisms are built on this function. Ho w t o A l l o c a t e K e r n e l Me mo r y

void * page_address(struct page *page)

I'll give you 3 guesses. Ho w t o A l l o c a t e K e r n e l Me mo r y

unsigned long __get_free_pages(unsigned int gfp_mask, unsigned int order)

Like alloc_pages but returns the logical address of the first requested page. Ho w t o A l l o c a t e K e r n e l Me mo r y

unsigned long get_zeroed_page(unsigned int gfp_mask)

This clears the page with zeroes before giving it to us.

Why might this be a good idea? Ho w t o A l l o c a t e K e r n e l Me mo r y

Matching free functions:

void __free_pages(struct page *page, unsigned int order)

void free_pages(unsigned long addr, unsigned int order)

void free_page(unsigned long addr) Ho w t o A l l o c a t e K e r n e l Me mo r y

void * kmalloc(size_t size, int flags)

Most of the time in the kernel kmalloc is used. kmalloc is like malloc -- it allocates in bytes. Can take special flags to indicate how the allocation should be performed -- GFP_ATOMIC tells kmalloc it's not allowed to sleep. Why might that be useful? Ho w t o A l l o c a t e K e r n e l Me mo r y

void * vmalloc(unsigned long size)

kmalloc always allocates _physically continuous memory_. vmalloc can be used to allocate memory through the system. Why not always vmalloc? Ho w t o A l l o c a t e K e r n e l Me mo r y

Matching free functions:

void kfree(const void *ptr)

void vfree(void *addr) S l a b A l l o c a t o r

This is the kernel. We need to be ludicrously efficient. We just spent a bunch of time in class discussing how immensely clever page tables are. But that will not prevent us from throwing them to the wind. S l a b A l l o c a t o r

Page tables will suffice for user level code but kernel level code needs to be more space and speed efficient.

1. Create a cache for a type of object, say task_struct 2. Allocate pages to store task_structs in (slabs) 3. Tightly pack task_structs inside the pages and reuse them S l a b A l l o c a t o r

Benefits:

1. Tightly packed structures take up less space.

2. By reusing already allocated memory we avoid costly allocations S l a b A l l o c a t o r

Lets look at how we make a cache. Look how scary this is:

kmem_cache_t * kmem_cache_create(const char *name, size_t size, size_t align, unsigned long flags, void (*ctor)(void*, kmem_cache_t *, unsigned long), void (*dtor)(void*, kmem_cache_t *, unsigned long));

A little context may help... kernel/fork.c T h e K e r n e l i s S p e c i a l

In the kernel there are a lot more heap allocations than in user level code. Why might that be? T h e K e r n e l i s S p e c i a l

Because the kernel only gets 1-2 pages of stack space! On x86 that's 8KB. Why so small?

"When each process is given a small, fixed stack, memory consumption is minimized and the kernel need not burden itself with stack management code." - Robert Love

*This is why Reiser4 crashes with 4KB stacks; its call chain is too big P a ge T a b l e s

The kernel uses a 3 level .

*On 32-bit level architectures the middle page table is simply ignored (set to 1)

*This is “good enough” because 64-bit architectures throw out 21-bits of addressing power P a ge T a b l e s

How To Get a Frame:

pmd = pmd_offset(pgd, address); pte = *pte_offset_map(pmd, address); page = pte_page(pte);

(PGD) = Layer 1 = “Page Global Directory” (PMD) = Layer 2 = “Page Middle Directory” (PTE) = Layer 3 = “Page Table Entry” S o u r c e s

Linux Kernel Development 2nd Edition by Robert Love (Chapters 11, 14, and 15)

Linux Memory Management Wiki: http://linux­mm.org/

kerneltrap.org article Feature: High Memory In The http://kerneltrap.org/node/2450

Freenode.net #linux #kernel

Explore the Linux memory model http://www­128.ibm.com/developerworks/linux/library/l­memmod/index.html

Kernel comparison: Improved memory management in the 2.6 kernel http://www­128.ibm.com/developerworks/linux/library/l­mem26/

x86­64 has support 4 level page tables. Experimental kernel patches available for this. http://lwn.net/Articles/106177/