Interrupt and System Call in Linux Today
Interrupts in Linux
System calls in Linux Monolithic kernel
All OS components run in kernel mode User mode APP Kernel mode FS Mem Net
Why good? ▪ Can be efficient. Cross-component access cheap
Why bad? ▪ No boundaries Big, complex kernel hard to change • Hard to do new stuff in OS OS researchers unhappy • No flexibility for apps. Hard to customize for speed (web server)
▪ Trusted computing base (TCB) large, one error entire kernel crash, or be compromised Virtual Machine
Virtual Machine Monitor (VMM): kernel that provides hardware interface APP APP APP User mode OS OS OS Kernel mode VMM Why good? ▪ Isolation. Strong protection between VMs ▪ Consolidation. One physical machine, multiple VMs ▪ Mobility. Can move VMs around ▪ Standardization: same hw better system mgmt Virtual Machine (cont)
Normal operating system environment: ▪ running in supervisor mode ▪ full access to machine state and I/O devices
Virtualized guest operating systems: ▪ running in user mode ▪ no direct access to machine state
Tasks of the virtual machine monitor: ▪ reconciling the virtual and physical architecture ▪ preventing virtual machines from interfering with each other or the monitor ▪ Do it fast? Not a easy job … Linux kernel structure
Core + dynamically loadable modules
Modules include: device drivers, file systems, network protocols, etc
Modules were originally developed to support the conditional inclusion of device drivers ▪ Early OS kernels would need to either: • include code for all possible devices or • be recompiled to add support for a new device ▪ Now, Modules can be dynamically loaded and unloaded
Modules are used extensively Linux kernel structure (cont.) Applications
System Libraries (libc)
System Call Interface
I/O Related Process Related File Systems Scheduler
Networking Memory Management Modules Device Drivers IPC
Architecture-Dependent Code
Hardware Types of Interrupts on 80386
Interrupts, asynchronous, from external devices, not related to code running ▪ Maskable interrupts ▪ Nonmaskable interrupts (NMI): hardware error
Exceptions, synchronous, raised by CPU ▪ Processor-detected exceptions: • Faults — correctable; offending instruction is retried • Traps — often for debugging; instruction is not retried • Aborts — major error (hardware failure), RIP wrong ▪ Programmed exceptions: • Requests for kernel intervention (software intr/syscalls) Faults
Instruction would be illegal to execute Examples: ▪ Writing to a memory segment marked ‘read- only’ ▪ Reading from an unavailable memory segment (on disk) page fault ▪ Executing a ‘privileged’ instruction
Detected before incrementing the IP The causes of ‘faults’ can often be ‘fixed’ Traps
A CPU might have been programmed to automatically switch control to a ‘debugger’ program after it has executed an instruction
That type of situation is known as a ‘trap’
It is activated after incrementing the IP Handling Exceptions
Most error exceptions — divide by zero, invalid operation, illegal memory reference, etc. — translate directly into signals This isn’t a coincidence. . . The kernel’s job is fairly simple: send the appropriate signal to the current process ▪ force_sig(sig_number, current); That will probably kill the process, but that’s not the concern of the exception handler
One important exception: page fault
An exception can (infrequently) happen in the kernel ▪ die(); // kernel oops Interrupt # assignment
Total possible 0-255 Interrupt ID numbers First 32 reserved by Intel for NMI and exceptions OS’s such as Linux are free to use the remaining 224 available interrupt ID numbers for their own purposes (e.g., for service-requests from external devices, or for other purposes such as system-calls) Examples : ▪ 0: divide-overflow fault ▪ 3: breakpoint ▪ 8: fault while handling interrupt ▪ 14: Page-Fault Exception ▪ 128: system calls Interrupt Descriptor Table
The ‘entry-point’ to the interrupt-handler is located via the Interrupt Descriptor Table (IDT) IDT: “gate descriptors” ▪ Location of handler ▪ Descriptor Privilege Level (DPL), prevent bad access • Can invoke only when current privilege level (CPL) < DPL • This is just the mode bit for protection
▪ Gates (slightly different ways of entering kernel) • Interrupt gate: disables further interrupts • Trap gate: further interrupts still allowed • Task gate: includes TSS to transfer to (used when RIP is bad, or hardware failure) Loading an Interrupt handler
Hardware locates the proper gate descriptor for this interrupt vector, and locates the new context Verifies Current Privilege Level (CPL) <= Descriptor Privilege level (DPL) Load a new stack pointer if needed Hw saves old IP, on rcx Set IP, etc to interrupt handler = invoke handler ▪ disable interrupt by unsetting IF bit in eflags register Handler saves old CPU state on r11 1 5 The system-call jump-table
There are approximately 300 system-calls Any specific system-call is selected by its ID- number (it’s placed into register rax) It would be inefficient to use if-else tests or even a switch-statement to transfer to the service-routine’s entry-point Instead an array of function-pointers is directly accessed (using the ID-number) This array is named ‘sys_call_table[]’