Operating Systems Engineering

Operating Systems Engineering Recitation 2: Boot and first process Based on MIT 6.828 (2014, lec3-5) Focus on xv6 • An educational OS based on UNIX V6 • only a few abstractions \ services • Processes • File system • I/O (via file descriptors) Power on (“boot”) the machine ● Initial state – nothing in RAM, need to read kernel from disk ● BIOS in charge, start bootloader (sheets 84-85) – copy first “boot” sector to 0x7c00 in mem – boot sector = bootasm.S + bootmain.c – executing at end of bootasm.S – stack below 0x7c00, so we can call into C – call bootmain xv6 – bootloader (1) ● Bootloader in charge: – bootmain() – sheet 85 ● Two jobs: – copy kernel from disk to mem (0x100000 phys addr) ● not file–-sectors, raw disk ● linker writes in ELF format – jump to kernel's first instruction ● elf->entry ● objdump -f kernel or readelf -e kernel ● kernel.ld and entry.S xv6 – bootloader (2) ● Why phys address 0x100000 (1MB)? – can't use 0x0 → memory mapped devices 0x0->1M – can use 0x200000 (2MB)? ● yes! it is possible ● Bootloader can load kernel to phys address if – it is a DRAM address – kernel must be able to find itself ● 0x100000 satisfies both conditions xv6 – bootloader (3) ● Where bootmain jumps? – entry->elf_entry from ELF header – not 0x100000 but 0x10000c ● this is "start", in entry.S, sheet 10 – linker put 0x10000c in the ELF header ● (gdb) b *0x10000c (entry.S) ● (gdb) si ● continue and then si to jmp *%eax ● (gdb) #0 0x801033b2 in main () at main.c:19 xv6 – processes • Process has user space memory: – instructions - actual computation flow – data - variables used in computation – stack – organize procedures calls • Per-process state private to the kernel – page table – kernel stack – file descriptor table xv6 – Process Isolation • Prevent process X from spying on Y • Prevent process X from corrupting Y – Separated memory, file descriptors – Prevent resource exhaustion (fairness) • Protect kernel from processes • Defensive tactic – Against buggy programs – Against malicious programs xv6 – Isolation Mechanisms • User/Kernel mode flag • System call abstraction • Address spaces • Timeslicing User/Kernel Mode Flag • Called CPL in x86 • Bottom two bits of the cs register cs: CPL • CPL=0 – kernel mode – privileged • CPL=3 – user mode – not privileged User/Kernel Mode Flag ● CPL is the base to almost every isolation – CPL in low 2 bits of CS – CPL=0 -> can modify cr*, devices, can use any PTE – CPL=3 -> can't modify cr*, or use devs, and PTE_U enforced ● Writes to control registers (cs, for instance) ● Writes to certain flags ● Memory access ● I/O Port access ● However, setting CPL=3 is not enough ● Kernel needs to manage policy System calls ● Call from user to kernel – needs to change CPL ● Can this be done? – set CPL=0 – jump sys_open() ● How about a combined instruction that forces the user to jump to a kernel address? System calls - x86 solution ● Kernel sets allowed entry points ● int instruction sets CPL=0 and jumps – saves the values of cs and eip on stack – system call returns with iret – restores old cs and eip ● Should these instructions be privileged? xv6 – First Process (1) ● Each process state in struct proc (2103) ● Process states – UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE ● Each process address space maps – program memory (<0x80000000) – kernel instructions and data (>0x80100000) xv6 – First Process ● Each process has two stacks: user and kernel ● Thread of execution (aka thread) – p->kstack + code – p->state – p->pgdir ● When in user mode kernel stack empty ● When in kernel (syscall) user stack contains data but not used – thread state stored in kernel stack: ● local variables ● return address xv6 – Virtual Memory Layout ● 0x80000000=KERNBASE ● memlayout.h (0207) xv6 – First Address Space ● main.c → main() sheet 12 ● First process (see userinit, 2252) – allocproc() sheet 22, set up stack for "returning" to user space ● save trapret (3027) ● p->context->eip = forkret (2533) , ret will run forkret ● forkret will return to trapret (3027), then to userpace – Fill in kernel part of address space (setupkvm) – Fill in user part of address space ● 1 page containing initcode (see initcode.S) – Setup trapframe to exit kernel ● User-mode bit ● tf->eip = 0 (beginning of initcode.S) ● User-stack lives at top of 1 page of initcode – Set process to runnable xv6 – trap frame ● Kernel stack after allocproc() ● Ready to return to user space ● trapasm.S (3027) : trapret: popal popl %gs popl %fs popl %es popl %ds addl $0x8, %esp # trapno # and errcode iret xv6 – First System Call exec() ● initcode.S (7708) calls exec “/init” ● Instruction int enters kernel again ● System call exec (3207) – replace initcode with /init binary – run /init which ● creates new console ● start shell ● handles orphaned zombies ● system is up Questions? Backup Slides xv6 – A Monolithic Kernel • Kernel is a big program • Contains all services, low level hardware mechanisms • Entire kernel runs with full privileges • Pros – easy kernel subsystem interactions • Cons – complex interactions => bugs => system crash – no isolation in the kernel • Unix, Linux, BSD family, Solaris, xv6 Micro Kernel • Kernel is a small program – A micro kernel tries to run most services as daemons in user space. • Only kernel runs with full privileges • Microkernel is essentially a high speed context-switching engine • Pros – complex service interactions => bugs => service crash but system alive! – kernel isolated from services, services isolated from user • Cons – complex OS subsystem interactions using IPC – a lot of of messaging and context switching involved ● MINIX, QNX, L4 Exo Kernel • Kernel is a very small program – concept of an exokernel is orthogonal to that of micro- vs. monolithic kernels. – there are no forced abstractions – security separated from abstraction • Kernel and users can run with full privileges • Pros – simplicity and performance – freedom: users can implement their own optimal subsystems • Cons – additional effort from users and system maintainers ● JOS, nonkernel, BareMetal OS.

Operating Systems Engineering

Unix Protection

A Practical UNIX Capability System

System Calls System Calls

Advanced Components on Top of a Microkernel

Lecture Notes in Assembly Language

Operating System Support for Parallel Processes

Android Porting Guide Step by Step

Cristina Opriceana, Hajime Tazaki (IIJ Research Lab.) Linux Netdev 2.2, Seoul, Korea 08 Nov

A Component-Based Environment for Android Apps

Installing Management Node Remotely

View the Slides

Capability Myths Demolished