Copyright Notice

These slides are distributed under the Creative Commons Attribution 3.0 License COMP9242 Advanced OS • You are free: S2/2016 W01: Introduction to seL4 – to share—to copy, distribute and transmit the work @GernotHeiser – to remix—to adapt the work • under the following conditions: – Attribution: You must attribute the work (but not in any way that suggests that the author endorses you or your use of the work) as follows: “Courtesy of , UNSW Australia”

The complete license text can be found at http://creativecommons.org/licenses/by/3.0/legalcode

2 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Monolithic Kernels vs Evolution

• Idea of microkernel: First generation Second generation Third generation – Flexible, minimal platform • Eg [’87] • Eg L4 [’95] • seL4 [’09] – Mechanisms, not policies – Goes back to Nucleus [Brinch Hansen, CACM’70] Composite kernel Memory Objects does user-mode Low-level FS, scheduling Memory- Swapping mangmt Devices library Kernel memory Kernel memory Application Syscall Scheduling Scheduling Scheduling

User IPC, MMU abstr. IPC, MMU abstr. IPC, MMU abstr. VFS Mode Unix File Server Device Server IPC, file system Application Driver • 180 syscalls • ~7 syscalls • ~3 syscalls

Scheduler, virtual memory Kernel • 100 kLOC • ~10 kLOC • 9 kLOC Mode Device drivers, dispatcher IPC, virtual memory IPC • 100 µs IPC • ~ 1 µs IPC • 0.1 µs IPC Hardware Hardware • capabilities • design for isolation

3 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 4 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 2nd-Generation Microkernels L4: A Family of High-Performance Microkernels First L4 • 1st-generation kernels (Mach, Chorus) were a failure iOS security kernel with co-processor – Complex, inflexible, slow capabilities • L4 was first 2G microkernel [Liedtke, SOSP’93, SOSP’95] API Inheritance L4-embed. seL4 – Radical simplification & manual micro-optimisation Code Inheritance – “A concept is tolerated inside the microkernel only if moving it outside OKL4 Microvisor the kernel, i.e. permitting competing implementations, would prevent the implementation of the system’s required functionality.” L4/MIPS OKL4 µKernel

– High IPC performance L4/Alpha Codezero Qualcomm • Family of L4 kernels: modem chips L3 → L4 “X” Hazelnut Pistachio – Original Liedtke (GMD) assembler kernel (‘95) – Family of kernels developed by Dresden, UNSW/NICTA, Karlsruhe Fiasco Fiasco.OC UNSW/NICTA – Commercial clones (PikeOS, P4, CodeZero, …) GMD/IBM/Karlsruhe NOVA – Influenced commercial QNX (‘82), Green Hills Integrity (‘90s) Dresden OK Labs P4 → PikeOS – Generated NICTA startup (OK Labs) Commercial Clone

o large-scale commercial deployment (multiple billions shipped) 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13

5 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 6 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Issues of 2G Microkernels seL4 Principles

• L4 solved performance issue [Härtig et al, SOSP’97] • Single protection mechanism: capabilities • Left a number of security issues unsolved – Proper time management to be finished this year • Problem: ad-hoc approach to protection and resource management • All resource-management policy at user level – Global thread name space covert channels [Shapiro’03] – Painful to use – Threads as IPC targets insufficient encapsulation – Need to provide standard memory-management library – Single kernel memory pool DoS attacks o Results in L4-like programming model – Insufficient delegation of authority limited flexibility, • Suitable for formal verification (proof of implementation correctness) performance – Attempted since ‘70s – Unprinciple management of time – Finally achieved by L4.verified project at NICTA [Klein et al, SOSP’09] • Addressed by seL4 – Designed to support safety- and security-critical systems – Principled time management not yet mainline (RT branch)

7 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 8 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License seL4 Concepts Note: differences What are (Object) Capabilities? between AOS and mainline kernels! • Capabilities (Caps) Cap = Access Token: Prima-facie evidence – mediate access Eg. thread, • Kernel objects: of privilege file, … – Threads (thread-control blocks: TCBs) – Address spaces (page table objects: PDs, PTs) – Endpoints (IPC EPs, Notification AEPs) Obj reference – Capability spaces (CNodes) Access rights Object – Frames Eg. read, – Interrupt objects write, send, Cap typically in kernel to execute… – Untyped memory • OO API: protect from forgery

• System calls err = method( cap, args ); Ø user references cap – Send, Wait (and variants) • Used in some earlier microkernels: through handle – Yield – KeyKOS [‘85], Mach [‘87], EROS [‘99]

9 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 10 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

seL4 Capabilities Inter-Process Communication (IPC)

• Stored in cap space (CSpace) • Fundamental microkernel operation – Kernel object made up of CNodes – Kernel provides no services, only mechanisms – each an array of cap “slots” – OS services provided by (protected) user-level server processes • Inaccessible to userland – invoked by IPC – But referred to by pointers into CSpace (slot addresses) – These CSpace addresses are called CPTRs Client Server • Caps convey specific privilege (access rights) IPC – Read, Write, Grant (cap transfer) Mainline has seL4 • Main operations on caps: Execute too – Invoke: perform operation on object referred to by cap • seL4 IPC uses a handshake through endpoints: o Possible operations depend on object type – Transfer points without storage capacity send receive – Copy/Mint/Grant: create copy of cap with same/lesser privilege – Message must be transferred instantly – Move/Mutate: transfer to different address with same/lesser privilege o Single-copy user user by kernel – Delete: invalidate slot (cleans up object if this is the only cap to it) – Revoke: delete any derived (eg. copied or minted) caps

11 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 12 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License IPC: (Synchronous) Endpoints IPC Endpoints are Message Queues

Thread1 Thread2 Server Running Blocked Blocked Running Client1

….. Wait (ep1_cap, …) Client2 Send (ep1_cap, …) ….... Wait (ep2_cap, …)

….... ….... Send (ep2_cap, …) Kernel

TCB TCB EP • Threads must rendez-vous for message transfer 1 2 – One side blocks until the other is ready – Implicit synchronisation • EP has no sense of direction • Message copied from sender’s to receiver’s message registers First invocation queues caller • May queue senders or receivers – Message is combination of caps and data words Further callers of – never both at the same time! o Presently max 121 words (484B, incl message “tag”) same direction queue behind o Should never use anywhere near that much! • Communication needs 2 EPs!

13 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 14 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Client-Server Communication Call RPC Semantics

• Asymmetric relationship: Client Server – Server widely accessible, clients not Server Client1 Client2 – How can server reply back to client (distinguish between them)? Client Kernel Server • Client can pass (session) reply cap in first request Wait(ep,&rep) – server needs to maintain session state Call(ep,…) – forces stateful server design mint rep deliver to server • seL4 solution: Kernel provides single-use reply cap process – only for Call operation (Send+Wait) Send(rep,…) – allows server to reply to client deliver to client – cannot be copied/minted/re-used but can be moved destroy rep – one-shot (automatically destroyed after first use) process process

15 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 16 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Identifying Clients IPC Mechanics: Virtual Registers

Stateful server serving multiple clients • Like physical registers, virtual registers are thread state • Must respond to correct client – context-switched by kernel Server – Ensured by reply cap Client1 – implemented as physical registers or thread-local memory Client 1 • Message registers • Must associate request state Client2 – contain message transferred in IPC with correct state Client2 state – architecture-dependent subset mapped to physical registers • Could use separate EP per client o 5 on ARM, 3 on – endpoints are lightweight (16 B) – library interface hides details – but requires mechanism to wait on a set of EPs (like select) o 1st transferred word is special, contains message tag • Instead, seL4 allows to individually mark (“badge”) caps to same EP – API MR[0] refers to next word (not the tag!) – server provides individually badged caps to clients • Reply cap – server tags client state with badge (through Mint()) – overwritten by next receive! – kernel delivers badge to receiver on invocation of badged caps – can move to CSpace with cspace_save_reply_cap()

17 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 18 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

IPC Message Format Client-Server IPC Example

Load into Client Raw data tag register seL4_MessageInfo_t tag = seL4_MessageInfo_new(0, 0, 0, 1); seL4_SetTag(tag); Caps (on Send) CSpace reference for receiving Tag seL4_SetMR(0,1); Message Badges (on Receive) caps (Receive only) Set message register #0 seL4_Call(server_c, tag);

Allocate EP and retype Server seL4_Word addr = ut_alloc(seL4_EndpointBits); Caps # Msg Label err = cspace_ut_retype_addr(tcb_addr, seL4_EndpointObject, unwrapped Caps Length seL4_EndpointBits, cur_cspace, &ep_cap) seL4_CPtr cap = cspace_mint_cap(dest, cur_cspace, ep_cap, seL4_all_rights, seL4_CapData_Badge_new(0xff)); Insert EP into Meaning defined Bitmap indicating … CSpace by IPC protocol caps which had Caps sent seL4_Word badge; (Kernel or user) badges extracted or received seL4_MessageInfo_t msg = seL4_Wait(ep, &badge); … Cap is badged 0xff seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 0); Note: Don’t need to deal with this explicitly for project seL4_Reply(reply); Implicit use of reply cap

19 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 20 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Server Saving Reply Cap IPC Operations Summary

• Send (ep_cap, …), Wait (ep_cap, …) Server Save reply cap – blocking message passing seL4_Word addr = ut_alloc(seL4_EndpointBits); in CSpace – needs Write, Read permission, respectively err = cspace_ut_retype_addr(tcb_addr, seL4_EndpointObject, seL4_EndpointBits, cur_cspace, &ep_cap) • NBSend (ep_cap, …) seL4_CPtr cap = cspace_mint_cap(dest, cur_cspace, ep_cap, seL4_all_rights, – Polling send: silently discard message if receiver isn’t ready seL4_CapData_Badge_new(0xff)); … • Call (ep_cap, …) seL4_Word badge; – equivalent to Send (ep_cap,…) + reply-cap + Wait (ep_cap,…) seL4_MessageInfo_t msg = seL4_Wait(ep, &badge); seL4_CPtr slot = cspace_save_reply_cap(cur_cspace); – Atomic: guarantees caller is ready to receive reply … • Reply (…) seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 0); – equivalent to Send (rep_cap, …) Need error seL4_Send(slot, reply); handling cspace_free_slot(slot); • ReplyWait (ep_cap, …) Explicit use protocol ! of reply cap – equivalent to Reply (…) + Wait (ep_cap, …) – at present solely for efficiency of server operation Reply cap no longer valid No failure notification where this reveals info on other entities!

21 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 22 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Notifications: Asynchronous Endpoints Receiving from EP and AEP

Server • Logically, AEP is an array of binary semaphores Client Driver – Multiple signalling, select-like wait – Not a message-passing IPC operation! Server with synchronous and asynchronous interface Thread Thread • Implemented by 1 2 Running Blocked Blocked Running • Example: file system data word in AEP – synchronous (RPC-style) client protocol – Send OR-s sender’s w = Poll (ep_cap, …) – asynchronous notifications from driver cap badge to data word – Receiver can poll or wait …... w = Wait (ep_cap,…) • Could have separate threads waiting on endpoints Notify (aep_cap, …) ….... o waiting returns and – forces multi-threaded server, concurrency control ….... clears data word Notify (aep_cap, …) • Alternative: allow single thread to wait on both EP types o polling just returns – AEP is bound to thread with BindAEP() syscall data word – thread waits on synchronous endpoint – Notification delivered as if caller had been waiting on AEP

23 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 24 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License AOS vs Mainline Kernel Differences Derived Capabilities

• “Synchronous” vs “asynchronous” endpoint terminology is confusing • Badging is an example of capability derivation • seL4 really has only synchronous IPC, plus signal-like notifications • The Mint operation creates a new, less powerful cap • Fixed in recent mainline kernels – Can add a badge o Mint ( , ) Remember, caps are kernel – Can strip access rights AOS Kernel Mainline objects! o eg WRR/O • Sync EP, sync message • EP, message • Granting transfers caps over an Endpoint • AEP, async notification • Notification obj, notification – Delivers copy of sender’s cap(s) to receiver • Send/Receive/Call/Reply&Wait • Send/Receive/Call/Reply&Wait o reply caps are a special case of this • NBSend (EP) • NBSend, Poll, NBReply&Wait – Sender needs Endpoint cap with Grant permission • AEP: NBSend, Wait • Signal, Poll, Wait – Receiver needs Endpoint cap with Write permission o else Write permission is stripped from new cap • Retyping – Fundamental operation of seL4 memory management – Details later…

25 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 26 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

seL4 System Calls seL4 Memory-Management Principles Will change • Notionally, seL4 has 6 syscalls: soon • Memory (and caps referring to it) is typed: – Yield(): invokes scheduler – Untyped memory: o only syscall which doesn’t require a cap! o unused, free to Retype into something else – Send(), Receive() and 3 variants/combinations thereof – Frames: o Notify() is actually not a separate syscall but same as Send() o (can be) mapped to address spaces, no kernel semantics – This is why I earlier said “approximately 3 syscalls” J – Rest: TCBs, address spaces, CNodes, EPs o used for specific kernel data structures • All other kernel operations are invoked by “messaging” • After startup, kernel never allocates memory! – Invoking Call() on an object cap – All remaining memory made Untyped, handed to initial address space o Logically sending a message to the kernel • Space for kernel objects must be explicitly provided to kernel – Each object has a set of kernel protocols – Ensures strong resource isolation o operations encoded in message tag • Extremely powerful tool for shooting oneself in the foot! o parameters passed in message words – We hide much of this behind the cspace and ut allocation libraries – Mostly hidden behind “syscall” wrappers

27 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 28 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Capability Derivation Cspace Operations

• Copy, Mint, Mutate, Revoke are invoked on CNodes extern cspace_t * cspace_create(int levels); /* either 1 or 2 level */ Mint( , dest, src, rights, ) extern cspace_err_t cspace_destroy(cspace_t *c);

extern seL4_CPtr cspace_copy_cap(cspace_t *dest, cspace_t *src, seL4_CPtr src_cap, seL4_CapRights rights);

extern seL4_CPtr cspace_mint_cap(cspace_t *dest, cspace_t *src, seL4_CPtr src_cap, seL4_CapRights rights, seL4_CapData badge);

extern seL4_CPtr cspace_move_cap(cspace_t *dest, cspace_t *src, seL4_CPtr src_cap); – CNode cap must provide appropriate rights • Copy takes a cap for destination extern cspace_err_t cspace_delete_cap(cspace_t *c, seL4_CPtr cap);

– Allows copying of caps between Cspaces extern cspace_err_t cspace_revoke_cap(cspace_t *c, seL4_CPtr cap); – Alternative to granting via IPC (if you have privilege to access Cspace!)

29 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 30 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

cspace and ut libraries seL4 Memory Management Approach

User-level OS Strong isolation, seL4 Addr Personality No shared kernel Space resources System Calls Resources fully delegated, allows RM autonomous RM operation Addr Addr Addr Data Library Calls Space Space Space

Resource Manager Resource Manager

ut_alloc() cspace_create() RM RM ut_free() cspace_destroy() Data Data … … Global Resource Manager Wraps messy Manages slab RAM Kernel GRM of Untyped Cspace tree & Data Data Extend for slot management own needs!

31 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 32 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Memory Management Mechanics: Retype seL4 Address Spaces (VSpaces)

Retype (Untyped, 21) Mainline and • Very thin wrapper around hardware page tables AOS kernels differ, both – Architecture-dependent more general – ARM and (32-bit) x86 are very similar

2 1 Retype (Frame, 2 ) Retype (Untyped, 2 ) • Page directories (PDs) map page tables, page tables (PTs) map pages

r,w r,w r,w r,w • A VSpace is represented by a PD object: Retype (CNode, 2m, 2n) Retype (TCB, 2n) Mint (r) – Creating a PD (by Retype) Page_Map(PT) creates the VSpace Revoke() r – Deleting the PD deletes … … the VSpace

F0 F1 UT1 F2 F3 UT0 UT3 … UT 2 UT 4… PageTable_Map(PD)

33 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 34 COMP9242 S2/2015 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Address Space Operations cap to level 1 Multiple Frame Mappings: Shared Memory page table seL4_Word frame_addr = ut_alloc(seL4_PageBits); err = cspace_ut_retype_addr(frame_addr, seL4_ARM_Page, seL4_CPtr new_frame_cap = cspace_copy_cap(cur_cspace, cur_cspace, seL4_ARM_PageBits, cur_cspace, &frame_cap); existing_frame_cap, seL4_AllRights); map_page(frame_cap, pd_cap, 0xA0000000, seL4_AllRights, seL4_ARM_Default_VMAttributes); map_page(new_frame_cap, pd_cap, 0xA0000000, seL4_AllRights, bzero((void *)0xA0000000, PAGESIZE); seL4_ARM_Default_VMAttributes); bzero((void *)0xA0000000, PAGESIZE); • Each mapping has: – virtual_address, phys_address, address_space and frame_cap – address_space struct identifies the level 1 page_directory cap seL4_ARM_Page_Unmap(existing_frame_cap); cspace_delete_cap(existing_frame_cap) – you need to keep track of (frame_cap, PD_cap, v_adr, p_adr)! seL4_ARM_Page_Unmap(new_frame_cap); cspace_delete_cap(new_frame_cap) seL4_ARM_Page_Unmap(frame_cap); ut_free(frame_addr, seL4_PageBits); cspace_delete_cap(frame_cap) ut_free(frame_addr, seL4_PageBits); Poor API choice! • Each mapping requires its own frame cap even for the same frame

35 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 36 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Memory Management Caveats Memory-Management Caveats

• The object manager handles allocation for you • Objects are allocated by Retype() of Untyped memory • Very simple buddy-allocator, you need to understand how it works: • The kernel will not allow you to overlap objects But debugging nightmare if – Freeing an object of size n: you can allocate new objects <= size n • ut_alloc and ut_free() manage user-level’s view of you try!! – Freeing 2 objects of size n does not mean that you can allocate an Untyped allocation. object of size 2n. – Major pain if kernel and user’s view diverge – TIP: Keep objects address and CPtr together. Object Size (B), ARM Alignment (B), ARM Frame 212 212 • Be careful with allocations! Page directory 214 214 • Don’t try to allocate all of physical 15 Endpoint 24 24 Untyped Memory 2 B memory as frames, you need more Implementation 4 4 memory for TCBs, endpoints etc. Cslot 2 2 choice! Cnode 214 214 • Your frametable will eventually TCB 29 29 integrate with ut_alloc to manage the 4KiB untyped size. Page table 210 210 8 frames

37 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 38 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Threads Threads

• Theads are represented by TCB objects Creating a thread • They have a number of attributes (recorded in TCB): • Obtain a TCB object – VSpace: a virtual address space • Set attributes: Configure() o page directory reference – associate with VSpace, CSpace, fault EP, prio, define IPC buffer o multiple threads can belong to the same VSpace • Set SP, IP (and optionally other registers): WriteRegisters() – CSpace: capability storage – this results in a completely initialised thread o CNode reference (CSpace root) plus a few other bits – will be able to run if resume_target is set in call, else still inactive – Fault endpoint • Activated (made schedulable): Resume() o Kernel sends message to this EP if the thread throws an exception – IPC buffer (backing storage for virtual registers) – stack pointer (SP), instruction pointer (IP), user-level registers – Scheduling priority – Time slice length (presently a system-wide constant) Yes, this is broken! • These must be explicitly managed Fixed in later – … we provide an example you can modify kernels

39 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 40 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Creating a Thread in Own AS and Cspace Threads and Stacks

• Stacks are completely user-managed, kernel doesn’t care! static char stack[100]; int thread_fct() { – Kernel only preserves SP, IP on context switch while(1); • Stack location, allocation, size must be managed by userland return 0; } • Beware of stack overflow! /* Allocate and map new frame for IPC buffer as before */ – Easy to grow stack into other data seL4_Word tcb_addr = ut_alloc(seL4_TCBBits); o Pain to debug!

err = cspace_ut_retype_addr(tcb_addr, seL4_TCBObject, seL4_TCBBits, – Take special care with automatic arrays! cur_cspace, &tcb_cap) err = seL4_TCB_Configure(tcb_cap, FAULT_EP_CAP, PRIORITY, curspace->root_cnode, seL4NilData, Stack 1 Stack 2 seL4_CapInitThreadPD, seL4_NilData, PROCESS_IPC_BUFFER, ipc_buffer_cap); seL4_UserContext context = { .pc = &thread, .sp = &stack}; f () { seL4_TCB_WriteRegisters(tcb_cap, 1, 0, 2, &context); int buf[10000]; . . . If you use threads, write a library to create and destroy them. }

41 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 42 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Creating a Thread in New AS and CSpace seL4 Scheduling Better model in “RT” branch – merge soon /* Allocate, retype and map new frame for IPC buffer as before * Allocate and map stack??? • Present seL4 scheduling model is fairly naïve * Allocate and retype a TCB as before * Allocate and retype a seL4_ARM_PageDirectoryObject of size seL4_PageDirBits • 256 hard priorities (0–255) * Mint a new badged cap to the syscall endpoint – Priorities are strictly observed */ – The scheduler will always pick the highest-prio runnable thread cspace_t * new_cpace = ut_alloc(seL4_TCBBits); – Round-robin scheduling within prio level char *elf_base = cpio_get_file(_cpio_archive, “test”)->p_base; • Aim is real-time performance, not fairness err = elf_load(new_pagedirectory_cap, elf_base); – Kernel itself will never change the prio of a thread unsigned int entry = elf_getEntryPoint(elf_base); – Achieving fairness (if desired) is the job of user-level servers err = seL4_TCB_Configure(tcb_cap, FAULT_EP_CAP, PRIORITY, new_cspace->root_cnode, seL4NilData, 0 prio 255 new_pagedirectory_cap, seL4_NilData, PROCESS_IPC_BUFFER, ipc_buffer_cap); seL4_UserContext context = {.pc = entry, .sp = &stack}; seL4_TCB_WriteRegisters(tcb_cap, 1, 0, 2, &context);

43 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 44 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Exception Handling Interrupt Handling

• A thread can trigger different kinds of exceptions: – invalid syscall IRQ triggered. o may require instruction emulation or result from virtualization Handler performs Kernel fakes appropriate action. – capability fault notification on AEP o cap lookup failed or operation is invalid on cap – page fault Interrupt o attempt to access unmapped memory handler (driver) o may have to grow stack, grow heap, load dynamic library, … – architecture-defined exception o divide by zero, unaligned access, … • Results in kernel sending message to fault endpoint Handler waits Kernel ACKs IRQ on AEP – exception protocol defines state info that is sent in message • Replying to this message restarts the thread – endless loop if you don’t remove the cause for the fault first!

45 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 46 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

Interrupt Management Interrupt Handling

• seL4 models IRQs as messages sent to an AEP • IRQHandler cap allows driver to bind AEP to interrupt – Interrupt handler has Receive cap on that AEP • Afterwards: • 2 special objects used for managing and acknowledging interrupts: – AEP is used to receive interrupt – Single IRQControl object – IRQHandler is used to acknowledge interrupt o single IRQControl cap provided by kernel to initial VSpace IRQHandler o only purpose is to create IRQHandler caps – Per-IRQ-source IRQHandler object SetEndpoint(aep) o interrupt association and dissociation o interrupt acknowledgment Wait(aep)

IRQControl Ack(handler) Get(usb) seL4_IRQHandler interrupt = cspace_irq_control_get_cap(cur_cspace, seL4_CapIRQControl, irq_number); IRQHandler seL4_IRQHandler_SetEndpoint(interrupt, async_ep_cap); seL4_IRQHander_ack(interrupt);

ACK first to unmask IRQ 47 COMP9242 S2/2015 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 48 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License Device Drivers Project Platform: i.MX6 Sabre Lite

• In seL4 (and all other L4 kernels) drivers are usermode processes • Drivers do three things: seL4_DebugPutChar() – Handle interrupts (already explained) – Communicate with rest of OS (IPC + shared memory) 1 GiB Serial Port – Access device registers Memory ARMv7 • Device register access Cortex A9 M0 – serial over LAN for userlevel apps – Devices are memory-mapped on ARM CPU Timer & – Have to find frame cap from bootinfo structure other Ethernet – Map the appropriate page in the driver’s VSpace devices M6 – Network File System (NFS)

device_vaddr = map_device(0xA0000000, (1 << seL4_PageBits)); … *((void *) device_vaddr= …;

Magic device register access

49 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License 50 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License

in the Real World (Courtesy Boeing, DARPA)

51 COMP9242 S2/2016 W01 © 2016 Gernot Heiser. Distributed under CC Attribution License