COMP9242 Advanced OS S2/2018 W01: Introduction to seL4 @GernotHeiser Copyright Notice

These slides are distributed under the Creative Commons Attribution 3.0 License

• You are free: – to share—to copy, distribute and transmit the work – to remix—to adapt the work • under the following conditions: – Attribution: You must attribute the work (but not in any way that suggests that the author endorses you or your use of the work) as follows: “Courtesy of Gernot Heiser, UNSW Sydney”

The complete license text can be found at http://creativecommons.org/licenses/by/3.0/legalcode

2 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Reducing Trusted Computing Base:

• Idea of : IPC performance – Flexible, minimal platform is critical! – Mechanisms, not policies – Actual OS functionality provided by user-mode servers – Servers invoked by kernel-provided message-passing mechanism (IPC) – Goes back to Nucleus [Brinch Hansen’70]

Application Syscall

VFS User Unix File Mode Server Device Server IPC, Application Driver Kernel Scheduler, virtual memory Mode

Device drivers, dispatcher IPC, virtual memory IPC Hardware Hardware

3 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Monolithic vs Microkernel OS Evolution

Monolithic OS Microkernel OS • New features add code kernel • Features add usermode code • New policies add code kernel • Policies replace usermode code • Kernel complexity grows • Kernel complexity is stable • Adaptable • Dependable

Application • Highly optimised Syscall

VFS User Unix File Mode Server Device Server IPC, file system Application Driver Kernel Scheduler, virtual memory Mode

Device drivers, dispatcher IPC, virtual memory IPC Hardware Hardware

4 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License 1993 “Microkernel”: IPC Performance

[µs] 400 i486 @ 50 MHz 300

Culprit: Cache 200 footprint 115 µs [Liedtke’95] L4 100 5 µs raw copy 0 0 2000 4000 6000 Message Length [B]

5 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Microkernel Principle: Minimality

A concept is tolerated inside the microkernel only if moving it outside the kernel, i.. permitting competing implementations, would prevent the implementation of the system’s required functionality. [SOSP’95]

Limited by arch- • Advantages of resulting small kernel: specific micro- – Easy to implement, port? optimisations – Easier to optimise – Hopefully enables a minimal trusted computing base – Easier debug, maybe even prove correct? Small attack • Challenges: surface, fewer – API design: generality despite small code base failure modes – Kernel design and implementation for high performance

6 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Microkernel Evolution

First generation Second generation Third generation • Eg Mach [’87] • L4 [’95] • seL4 [’09] (QNX, Chorus) (PikeOS, Integrity)

Memory Objects Low-level FS, Memory- Swapping mangmt Devices library Kernel memory Kernel memory Scheduling Scheduling Scheduling IPC, MMU abstr. IPC, MMU abstr. IPC, MMU abstr.

• 180 syscalls • ~7 syscalls • ~3 syscalls • 100 kSLOC • ~10 kSLOC • 9 kSLOC • 100 µs IPC • ~ 1 µs IPC • 0.1 µs IPC • capabilities • design for isolation

7 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License L4: A Family of High-Performance Microkernels First L4 iOS security kernel with co-processor capabilities

API Inheritance L4-embed. seL4 Code Inheritance OKL4 Microvisor

L4/MIPS OKL4 µKernel

L4/Alpha Codezero Qualcomm modem chips L3 → L4 “X” Hazelnut Pistachio

Fiasco Fiasco.OC UNSW/NICTA GMD/IBM/Karlsruhe NOVA Dresden OK Labs P4 → PikeOS Commercial Clone

93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13

8 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Issues of 2G Microkernels

• L4 solved performance issue [Härtig et al, SOSP’97] • Left a number of security issues unsolved • Problem: ad-hoc approach to protection and resource management – Global thread name space ⇒ covert channels [Shapiro’03] – Threads as IPC targets ⇒ insufficient encapsulation – Single kernel memory pool ⇒ DoS attacks – Insufficient delegation of authority ⇒ limited flexibility, performance issues – Unprincipled management of time

• Addressed by seL4 – Designed to support safety- and security-critical systems – Principled time management (MCS branch)

9 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License The seL4 Microkernel

10 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Principles

• Single protection mechanism: capabilities – Now also for time [Lyons et al, EuroSys’18] • All resource-management policy at user level – Painful to use – Need to provide standard memory-management library o Results in L4-like programming model • Suitable for formal verification (proof of implementation correctness) – Attempted since ‘70s – Finally achieved by L4.verified project at NICTA [Klein et al, SOSP’09]

11 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Concepts

• Capabilities (Caps) – mediate access • Kernel objects: – Threads (thread-control blocks: TCBs) – Address spaces (page table objects: PDs, PTs) – Endpoints (IPC) – Notifications – Capability spaces (CNodes) – Frames – Interrupt objects (architecture specific) – Untyped memory • System calls – Send, Wait (and variants) – Yield

12 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License What are (Object) Capabilities?

Cap = Access Token: Eg. thread, Prima-facie evidence file, … of privilege

Object Obj reference Access rights Cap typically in kernel to protect from forgery Eg. read, write, send, execute… Ø user references cap through handle • OO API: err = method( cap, args ); Linux • Used in some earlier microkernels: “capabilities” – KeyKOS [‘85], Mach [‘87], EROS [‘99] are different

13 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License seL4 Capabilities

• Stored in cap space (CSpace) – Kernel object made up of CNodes – each an array of cap “slots” • Inaccessible to userland – But referred to by pointers into CSpace (slot addresses) – These CSpace addresses are called CPTRs • Caps convey specific privilege (access rights) – Read, Write, Execute, Grant (cap transfer) • Main operations on caps: – Invoke: perform operation on object referred to by cap o Possible operations depend on object type – Copy/Mint/Grant: create copy of cap with same/lesser privilege – Move/Mutate: transfer to different address with same/lesser privilege – Delete: invalidate slot (cleans up object if this is the only cap to it) – Revoke: delete any derived (eg. copied or minted) caps

14 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Inter-Process Communication (IPC)

• Fundamental microkernel operation – Kernel provides no services, only mechanisms – OS services provided by (protected) user-level server processes – invoked by IPC

Client Server

seL4 IPC

• seL4 IPC uses a handshake through endpoints: – Transfer points without storage capacity send receive – Message must be transferred instantly o Single-copy user ➞ user by kernel

15 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License IPC: Endpoints

Thread1 Thread2 Running Blocked Blocked Running

….. Recv (ep1_cap, …)

Send (ep1_cap, …) ….... Recv (ep2_cap, …)

….... ….... Send (ep2_cap, …)

• Threads must rendez-vous for message transfer – One side blocks until the other is ready – Implicit synchronisation • Message copied from sender’s to receiver’s message registers – Message is combination of caps and data words o Presently max 121 words (484B, incl message “tag”) o Should never use anywhere near that much!

16 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Endpoints are Message Queues

Server Client1

Client2

Kernel

TCB1 TCB2 EP

First invocation • EP has no sense of direction queues caller • May queue senders or receivers Further callers of same direction – never both at the same time! queue behind • Communication needs 2 EPs!

17 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Client-Server Communication

• Asymmetric relationship: – Server widely accessible, clients not Server Client1 Client2 – How can server reply back to client (distinguish between them)?

• Client can pass (session) reply cap in first request – server needs to maintain session state – forces stateful server design

• seL4 solution: Kernel provides single-use reply cap – only for Call operation (Send+Recv) – allows server to reply to client – cannot be copied/minted/re-used but can be moved – one-shot (automatically destroyed after first use)

18 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Call RPC Semantics

Client Client Server

Call(ep,…)

Kernel Server

Recv(ep,&rep)

mint rep

deliver to server

process process Send(rep,…) deliver to client destroy rep process

19 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Stateful Servers: Identifying Clients

Stateful server serving multiple clients • Must respond to correct client Server – Ensured by reply cap Client1

Client1 • Must associate request Msg state with correct state Client2 Client2 state • Could use separate EP per client – endpoints are lightweight (16 B) – but requires mechanism to wait on a set of EPs (like select)

• Instead, seL4 allows to individually mark (“badge”) caps to same EP – server provides individually badged (session) caps to clients o separate endpoints for opening session, further invocations – server tags client state with badge – kernel delivers badge to receiver on invocation of badged caps

20 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License IPC Mechanics: Virtual Registers

• Like physical registers, virtual registers are thread state – context-switched by kernel – implemented as physical registers or thread-local memory • Message registers – contain message transferred in IPC – architecture-dependent subset mapped to physical registers o 4 on ARM & x64, 2 on ia32 – library interface hides details o 1st transferred word is special, contains message tag – API MR[0] refers to next word (not the tag!) Better model in • Reply cap “MCS” branch – – overwritten by next receive! merge soon – can move to CSpace with cspace_save_reply_cap()

21 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License IPC Message Format

Raw data

Caps (on Send) CSpace reference for receiving Tag Message Badges (on Receive) caps (Receive only)

Caps # Msg Label unwrapped Caps Length

Meaning defined Bitmap indicating by IPC protocol caps which had Caps sent (Kernel or user) badges extracted or received

Note: Don’t need to deal with this explicitly for project

22 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Client-Server IPC Example

Load into Client tag register seL4_MessageInfo_t tag = seL4_MessageInfo_new(0, 0, 0, 1); seL4_SetTag(tag); Set message seL4_SetMR(0,1); register #0 seL4_Call(server_c, tag);

Server Allocate slot & retype to EP ut_t* ut = ut_alloc(seL4_EndpointBits, &cspace); seL4_CPtr ep = cspace_alloc_slot(&cspace); err = cspace_untyped_retype(&cspace, ut->cap, ep, seL4_EndpointObject, seL4_EndpointBits); seL4_CPtr badged_ep = cspace_alloc_slot(&cspace); cspace_mint(&cspace, badged_ep, &cspace, ep, seL4_AllRights, 0xff); … seL4_Word badge; seL4_MessageInfo_t msg = seL4_Recv(cptr, &badge); Mint cap with … badge 0xff seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 1); seL4_Reply(reply); Implicit use of reply cap

23 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Server Saving Reply Cap

Server ut_t *ut = ut_alloc(seL4_EndpointBits, &cspace); seL4_CPtr ep = cspace_alloc_slot(&cspace); err = cspace_untyped_retype(&cspace, ut->cap, ep, seL4_EndpointObject, seL4_EndpointBits); seL4_CPtr badged_ep = cspace_alloc_slot(&cspace); cspace_mint(&cspace, badged_ep, &cspace, ep, seL4_AllRights, 0xff); … Save reply cap seL4_Word badge; in CSpace seL4_MessageInfo_t msg = seL4_Recv(cptr, &badge); seL4_CPtr reply_cap = cspace_alloc_slot(&cspace); cspace_save_reply_cap(&cspace, reply_cap); … seL4_MessageInfo_t reply = seL4_MessageInfo_new(0, 0, 0, 1); seL4_Send(reply_cap, reply); cspace_free_slot(&cspace, reply); Explicit use of reply cap Reply cap no longer valid

24 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License IPC Operations Summary

• Send (ep_cap, …), Recv (ep_cap, …) – blocking message passing – needs Write, Read permission, respectively • NBSend (ep_cap, …) – Polling send: silently discard message if receiver isn’t ready • Call (ep_cap, …) – equivalent to Send (ep_cap,…) + reply-cap + Wait (rep_cap,…) – Atomic: guarantees caller is ready to receive reply • Reply (…) – equivalent to Send (rep_cap, …) Need error handling • ReplyRecv (ep_cap, …) protocol ! – equivalent to Reply (…) + Recv (ep_cap, …) – at present solely for efficiency of server operation No failure notification where this reveals info on other entities!

25 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Notifications

• Logically, Notification is an array of binary semaphores – Multiple signalling, select-like wait – Not a message-passing IPC operation!

• Implemented by Thread1 Thread2 data word in Notification Running Blocked Blocked Running

– Send OR-s sender’s w = Poll (not_cap, …) cap badge to data word – Receiver can poll or wait …... w = Wait (not_cap,…) o waiting returns and Signal (not_cap , …) ….... clears data word ….... Signal (not_cap, …) o polling just returns data word

26 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Receiving from EP and Notification

Server

Client Driver

Server with synchronous and asynchronous interface • Example: file system – synchronous (RPC-style) client protocol – asynchronous notifications from driver • Could have separate threads waiting on endpoints – forces multi-threaded server, concurrency control • Alternative: allow single thread to wait on both events – Notification is bound to thread with TCB_BindNotification() – thread waits on Endpoint – Notification delivered as if caller had been waiting on Notification

27 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Derived Capabilities

• Badging is an example of capability derivation • The Mint operation creates a new, less powerful cap – Can add a badge o Mint ( , ) ➞ Remember, caps are kernel – Can strip access rights objects! o eg WR➞R/O • Granting transfers caps over an Endpoint – Delivers copy of sender’s cap(s) to receiver o reply caps are a special case of this – Sender needs Endpoint cap with Grant permission – Receiver needs Endpoint cap with Write permission o else Write permission is stripped from new cap • Retyping – Fundamental operation of seL4 memory management – Details later…

28 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License seL4 System Calls Will change • Notionally, seL4 has 6 syscalls: soon – Yield(): invokes scheduler o only syscall which doesn’t require a cap! – Send(), Recv() and 3 variants/combinations thereof o Signal() is actually not a separate syscall but same as Send() – This is why I earlier said “approximately 3 syscalls” J

• All other kernel operations are invoked by “messaging” – Invoking Call() on an object cap o Logically sending a message to the kernel – Each object has a set of kernel protocols o operations encoded in message tag o parameters passed in message words – Mostly hidden behind “syscall” wrappers

29 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License seL4 Memory-Management Principles

• Memory (and caps referring to it) is typed: – Untyped memory: o unused, free to Retype into something else – Frames: o (can be) mapped to address spaces, no kernel semantics – Rest: TCBs, address spaces, CNodes, EPs o used for specific kernel data structures • After startup, kernel never allocates memory! – All remaining memory made Untyped, handed to initial address space • Space for kernel objects must be explicitly provided to kernel – Ensures strong resource isolation • Extremely powerful tool for shooting oneself in the foot! – We hide much of this behind the cspace and ut allocation libraries

30 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Capability Derivation

• Copy, Mint, Mutate, Revoke are invoked on CNodes

Mint( , dest, src, rights, )

– CNode cap must provide appropriate rights • Copy takes a cap for destination – Allows copying of caps between Cspaces – Alternative to granting via IPC (if you have privilege to access Cspace!)

31 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Cspace Operations

int cspace_create_two_level(cspace_t *bootstrap, cspace_t *target, cspace_alloc_t cspace_alloc); int cspace_create_one_level(cspace_t *bootstrap, cspace_t *target); void cspace_destroy(cspace_t *c); ​seL4_CPtr cspace_alloc_slot(cspace_t *c); void cspace_free_slot(cspace_t *c, seL4_CPtr slot); seL4_Error cspace_copy(cspace_t *dest, seL4_CPtr dest_cptr, cspace_t *src, seL4_CPtr src_cptr, seL4_CapRights_t rights) cspace_delete(cspace_t *cspace, seL4_CPtr cptr) seL4_Error cspace_mint(cspace_t *dest, seL4_CPtr dest_cptr, cspace_t *src, seL4_CPtr src_cptr, seL4_CapRights_t rights, seL4_Word badge) cspace_move(cspace_t *dest, seL4_CPtr dest_cptr, cspace_t *src, seL4_CPtr src_cptr) seL4_Error cspace_mutate(cspace_t *dest, seL4_CPtr dest_cptr, cspace_t *src, seL4_CPtr src_cap, seL4_Word badge) seL4_Error cspace_revoke(cspace_t *cspace, seL4_CPtr cptr) seL4_Error cspace_save_reply_cap(cspace_t *cspace, seL4_CPtr cptr) seL4_Error cspace_irq_control_get(cspace_t *dest, seL4_CPtr cptr, seL4_IRQControl irq_cap, int irq, int level) seL4_Error cspace_untyped_retype(cspace_t *cspace, seL4_CPtr ut, seL4_CPtr target, seL4_Word type, size_t size_bits);

32 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License cspace and ut libraries

User-level OS seL4 Personality System Calls

Library Calls

ut_alloc() cspace_create() ut_free() cspace_destroy() … …

Manages Wraps messy ≤4KiB Untyped Cspace tree & Extend for slot management own needs!

33 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License seL4 Memory Management Approach

Strong isolation, Addr No shared kernel Space resources Resources fully delegated, allows RM autonomous RM operation Addr Addr Addr Data Space Space Space

Resource Manager Resource Manager

RM RM Data Data

Global Resource Manager

RAM Kernel GRMGRM Data Data

34 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Memory Management Mechanics: Retype

Retype (Untyped, 21) Actually more general (incremental retype)

Retype (Frame, 22) Retype (Untyped, 21)

r,w r,w r,w r,w

Retype (CNode, 2m, 2n) Retype (TCB, 2n) Mint (r)

Revoke() r … …

F0 F1 UT1 F2 F3 UT0 UT3 … UT 2 UT 4…

35 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License seL4 Address Spaces (VSpaces)

• Very thin wrapper around hardware page tables – Architecture-dependent – ARM & x86 similar (32-bit 2-level, 64-bit 4–5 level)

• ARM 64-bit ISA (AARCH64): – page global directory (PGD) – page upper directory (PUD) – page directory (PD) – page table (PT) Page_Map(PT) • A VSpace is represented by a PGD object: – Creating a PGD (by Retype) creates the VSpace – Deleting the PGD deletes PageTable_Map(PD) the VSpace

36 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Address Space Operations

seL4_Word paddr = 0; ut_t *ut = ut_alloc_4k_untyped(&paddr); seL4_CPtr frame = cspace_alloc_slot(&cspace); err = cspace_untyped_retype(&cspace, ut->cap, frame, cap to top-level seL4_ARM_SmallPageObject, seL4_PageBits); page table err = map_frame(&cspace, frame, pgd_cap, 0xA0000000, seL4_AllRights, seL4_Default_VMAttributes); Poor API choice!

Each mapping has: • virtual_address, phys_address, address_space and frame_cap • address_space struct identifies the level 1 page_directory cap • you need to keep track of (frame_cap, PD_cap, v_adr, p_adr)!

seL4_ARCH_Page_Unmap(frame_cap); cspace_delete(&cspace, frame); cspace_free_slot(&cspace, frame); ut_free(ut, seL4_PageBits);

37 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Multiple Frame Mappings: Shared Memory

seL4_CPtr new_frame_cap = cspace_alloc_slot(&cspace); seL4_Error err = cspace_copy(&cspace, new_frame_cap, &cspace, frame, seL4_AllRights); err = map_frame(&cspace, new_frame_cap, pgd_cap, 0xA0000000, seL4_AllRights, seL4_Default_VMAttributes);

seL4_ARCH_Page_Unmap(frame); cspace_delete(&cspace, frame); cspace_free_slot(&cspace, frame); ​seL4_ARCH_Page_Unmap(new_frame_cap); cspace_delete(&cspace, new_frame_cap); cspace_free_slot(&cspace, new_frame_cap); ut_free(ut, seL4_PageBits); Each mapping requires its own frame cap even for the same frame

38 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Memory Management Caveats

• The UT table handles allocation for you • Very simple buddy-allocator, you need to understand how it works: – Freeing an object of size n: you can allocate new objects <= size n – Freeing 2 objects of size n does not mean that you can allocate an object of size 2n.

Object Size (B), AARCH64 Alignment (B), AARCH64 Frame 212 212 PT/PD/PUD/PGD 212 212 Endpoint 24 24 Notification 25 25 Implementation 4 4 Cslot 2 2 choice! Cnode ≥ 212 212 TCB 211 211

39 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Memory-Management Caveats

• Objects are allocated by Retype() of Untyped memory • The kernel will not allow you to overlap objects But debugging nightmare if • ut_alloc and ut_free() manage user-level’s view of you try!! Untyped allocation. – Major pain if kernel and user’s view diverge – TIP: Keep objects address and CPtr together.

• Be careful with allocations! • Don’t try to allocate all of physical 15 Untyped Memory 2 B memory as frames, you need more memory for TCBs, endpoints etc. • Your frametable will eventually integrate with ut_alloc to manage the 4KiB untyped size. 8 frames

40 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Threads

• Theads are represented by TCB objects • They have a number of attributes (recorded in TCB): – VSpace: a virtual address space o page global directory (PGD) reference o multiple threads can belong to the same VSpace – CSpace: capability storage o CNode reference (CSpace root) plus a few other bits – Fault endpoint o Kernel sends message to this EP if the thread throws an exception – IPC buffer (backing storage for virtual registers) – stack pointer (SP), instruction pointer (IP), user-level registers – Scheduling priority and maximum controlled priority (MCP) – Time slice length (presently fixed) • These must be explicitly managed Yes, this is – … we provide an example you can modify broken!

41 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Threads

Creating a thread • Obtain a TCB object • Set attributes: Configure() – associate with VSpace, CSpace, fault EP, prio, define IPC buffer • Set scheduling parameters – priority (maybe MCP) • Set SP, IP (and optionally other registers): WriteRegisters() – this results in a completely initialised thread – will be able to run if resume_target is set in call, else still inactive • Activated (made schedulable): Resume()

42 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Creating a Thread in Own AS and Cspace

static char stack[100]; int thread_fct() { while(1); return 0; } /* Allocate and map new frame for IPC buffer as before */ ut_t *ut = ut_alloc(seL4_TCBBits, &cspace); /* alloc slot to retype into */ seL4_CPtr tcb = cspace_alloc_slot(&cspace); /* retype */ err = cspace_untyped_retype(&cspace, ut->cap, tcb, seL4_TCBObject, seL4_TCBBits); err = seL4_TCB_Configure(tcb, fault_ep, cspace.root_cnode, seL4_NilData, seL4_CapInitThreadVSpace, seL4NilData, PROCESS_IPC_BUFFER, ipc_buffer_cap); err = seL4_TCB_SetPriority(tcb, seL4_CapInitThreadTCB, PRIORITY);

If you use threads, write a library to create and destroy them.

43 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Threads and Stacks

• Stacks are completely user-managed, kernel doesn’t care! – Kernel only preserves SP, IP on context switch • Stack location, allocation, size must be managed by userland • Beware of stack overflow! – Easy to grow stack into other data o Pain to debug! – Take special care with automatic arrays!

Stack 1 Stack 2

f () { int buf[10000]; . . . }

44 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Creating a Thread in New AS and CSpace

/* Allocate, retype and map new frame for IPC buffer as before * Allocate and map stack??? * Allocate and retype a TCB as before * Allocate and retype a PageGlobalDirectoryObject of size seL4_PageDirBits * Mint a new badged cap to the syscall endpoint */ cspace_t * new_cpace = ut_alloc(seL4_TCBBits); char *elf_base = cpio_get_file(_cpio_archive, app_name, &elf_size); seL4_Word sp = init_process_stack(&cspace, new_pgd_cap, elf_base); err = elf_load(&cspace, seL4_CapInitThreadVSpace, tty_test_process.vspace, elf_base); err = seL4_TCB_Configure(tcb, fault_ep, new_cspace.root_cnode, seL4_NilData, new_pgd_cap, seL4NilData, PROCESS_IPC_BUFFER, ipc_buffer_cap); seL4_UserContext context = { .pc = elf_getEntryPoint(elf_base), .sp = sp, }; err = seL4_TCB_WriteRegisters(tty_test_process.tcb, 1, 0, 2, &context);

45 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License seL4 Scheduling Better model in “MCS” branch – • Present seL4 scheduling model is fairly naïve merge soon • 256 hard priorities (0–255) – Priorities are strictly observed – The scheduler will always pick the highest-prio runnable thread – Round-robin scheduling within prio level • Aim is real-time performance, not fairness – Kernel itself will never change the prio of a thread – Achieving fairness (if desired) is the job of user-level servers

0 prio 255

46 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Exception Handling

• A thread can trigger different kinds of exceptions: – invalid syscall o may require instruction emulation or result from virtualization – capability fault o cap lookup failed or operation is invalid on cap – page fault o attempt to access unmapped memory o may have to grow stack, grow heap, load dynamic library, … – architecture-defined exception o divide by zero, unaligned access, … • Results in kernel sending message to fault endpoint – exception protocol defines state info that is sent in message • Replying to this message restarts the thread – endless loop if you don’t remove the cause for the fault first!

47 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Interrupt Handling

IRQ triggered. Handler performs Kernel fakes signal appropriate action. on Notification

Interrupt handler (driver)

Handler waits Kernel ACKs IRQ on Notification

48 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Interrupt Management

• seL4 models IRQs as messages sent to a Notification – Interrupt handler has Receive cap on that Notification • 2 special objects used for managing and acknowledging interrupts: – Single IRQControl object o single IRQControl cap provided by kernel to initial VSpace o only purpose is to create IRQHandler caps – Per-IRQ-source IRQHandler object o interrupt association and dissociation o interrupt acknowledgment o edge-triggered flag IRQControl Get(usb)

IRQHandler

49 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Interrupt Handling

• IRQHandler cap allows driver to bind Notification to interrupt • Afterwards: – Notification is used to receive interrupt – IRQHandler is used to acknowledge interrupt

IRQHandler

SetEndpoint(notification)

Wait(notification) Ack(handler)

seL4_CPtr irq = cspace_alloc_slot(&cspace); seL4_Error err = cspace_irq_control_get(&cspace, irq, seL4_CapIRQControl, irq_number, true_if_edge_triggered); seL4_IRQHandler_SetNotification(irq, ntfn); seL4_IRQHandler_Ack(irq)​; ACK to unmask IRQ

50 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Device Drivers

• In seL4 (and all other L4 kernels) drivers are usermode processes • Drivers do three things: – Handle interrupts (already explained) – Communicate with rest of OS (IPC + shared memory) – Access device registers • Device register access – Devices are memory-mapped on ARM – Have to find frame cap from bootinfo structure – Map the appropriate page in the driver’s VSpace

device_vaddr = sos_map_device(&cspace, 0xA0000000, BIT(seL4_PageBits)); … *((void *) device_vaddr= …;

Magic device register access

51 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License Project Platform: ODROID-C2

ODROID-C2 Board seL4_DebugPutChar() Armlogic S905 SoC Serial ARMv8 ARMv8 Timer connector Cortex-A53 Cortex-A53 Core Core Serial M0: serial over LAN for userlevel apps Ethernet Ethernet ARMv8 ARMv8 connector Cortex-A53 Cortex-A53 Core Core Other… M6: Network File System (NFS)

2 GiB Memory

52 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License in the Real World (Courtesy Boeing, DARPA)

53 COMP9242 S2/2018 W01 © 2017 Gernot Heiser. Distributed under CC Attribution License