<<

Operating Systems

Practical Session 1 System Calls A few administrative notes…

• Course homepage: • http://www.cs.bgu.ac.il/~os182/Main • Assignments: ❑ Extending xv6 (a pedagogical OS) ❑ Submission in pairs. ❑ Frontal checking: 1. Assume the grader may ask anything. 2. Must register to exactly one checking session.

An operating system (OS) is system software that manages computer hardware and software resources and provides common services for computer programs. Main Components for today

• Process – Describes an execution of a program and all of its requirements • Kernel – The Heart of the OS, manages processes and other resources • System Calls – Means of communications between the process and the kernel Process

Process

Kernel Space

Run Kernel code Run user code Has: Has: - Stack - Stack - Heap - Heap - System wide view - Direct Hardware access

System calls System Calls

• A is an interface between a user application and a service provided by the operating system (or kernel). • These can be roughly grouped into five major categories: 1. Process control (e.g. create/terminate process) 2. File Management (e.g. , ) 3. Device Management (e.g. logically attach a device) 4. Information Maintenance (e.g. set time or date) 5. Communications (e.g. send messages) System Calls - motivation

• A process is not supposed to have a direct access the hardware/kernel. It can’t access the kernel memory or functions. • This is strictly enforced (‘protected mode’) for good reasons: • Can jeopardize other processes running. • Cause physical damage to devices. • Alter system behavior. • The system call mechanism provides a safe mechanism to request specific kernel operations. Jumping from user space to kernel space • A process running in user space cannot run code/access data structures in the kernel space • In x86 arch, in order to jump to kernel space, it is common that the process will use interrupts • When jumping to kernel space, the process (kernel) must store a “backup” for its current execution state (so that the kernel will be able to resume the execution later), this backup is referred to as a trapframe. System Calls - interface

• Calls are usually made with C/C++ library functions:

User Application C - Library Kernel System Call getpid() Load arguments, eax ← _NR_getpid, kernel mode (int 80) Call Sys_Call_table[eax] sys_getpid()

syscall_exit return

resume_userspace return

User-Space Kernel-Space

Remark: Invoking int 0x80 is common although newer techniques for “faster” control transfer (SYSCALL/SYSRET) are provided by both AMD’s and Intel’s architecture. XV6 CODE

10 System Calls - interface syscall.h // System call numbers #define • Calls are usually made with C/C++ librarySYS_fork functions: 1 #define SYS_exit 2 #define User Application C - Library KernelSYS_wait 3System #define Call SYS_pipe 4 #define getpid() Load arguments, SYS_read 5 #define eax ← _NR_getpid, SYS_kill 6 #define kernel mode (int 80) SYS_exec 7 #define SYS_fstatCall 8 #define Sys_Call_table[eax]SYS_chdir 9 #definesys_getpid() SYS_dup 10 #define SYS_getpid 11 #define return syscall_exit SYS_sbrk 12 #define SYS_sleep 13 #define resume_userspace SYS_uptime 14 #define SYS_open 15 #define return SYS_write 16 #define SYS_mknod 17 #define SYS_unlink 18 #define SYS_link 19 #define User-Space Kernel-SpaceSYS_mkdir 20 #define SYS_close 21 Remark: Invoking int 0x80 is common although newer techniques for “faster” control transfer are provided by both AMD’s and Intel’s architecture. usys.S #include "syscall.h" #include "traps.h" .globl fork; \ #defineSystem SYSCALL(name) Calls \ - interface fork : \ .globl name; \ name: \ movl $SYS_fork, %eax; \ movl $SYS_ ## name, %eax; \ int $T_SYSCALL; \ int• $T_SYSCALL;Calls are \usually made with C/C++ret library functions: ret

User Application C - Library Kernel System Call SYSCALL(fork) SYSCALL(exit)

SYSCALL(wait) getpid() Load arguments, ← SYSCALL(pipe) eax _NR_getpid, kernel mode (int 80) SYSCALL(read) Call SYSCALL(write) Sys_Call_table[eax] sys_getpid() SYSCALL() SYSCALL(kill) .globl fork; \ SYSCALL() syscall_exitfork : \ return SYSCALL() movl $1, %eax; \ SYSCALL(mknod) resume_userspace int $64; \ SYSCALL(unlink) SYSCALL(fstat) ret return SYSCALL(link) SYSCALL(mkdir) SYSCALL (chdir) SYSCALL() User-Space Kernel-Space SYSCALL(getpid) SYSCALLRemark:( Invokingsbrk) int 0x80 is common although newer techniques for “faster” control transfer are provided SYSCALLby both( AMD’ssleep and) Intel’s architecture. SYSCALL(uptime) System Calls - interface

trapasm.S .globl alltraps •alltrapsCalls :are usually made with C/C++ library functions: # Build trap frame. pushl User%ds Application C - Library Kernel System Call pushl %es pushl %fs getpid() Load arguments, pushl %gs eax ← _NR_getpid, pushal kernel mode (int 80) Call . Sys_Call_table[eax] sys_getpid() . . syscall_exit return # Call trap(tf), where tf=%esp pushl %esp resume_userspace call trap . return . . User-Space Kernel-Space

Remark: Invoking int 0x80 is common although newer techniques for “faster” control transfer are provided by both AMD’s and Intel’s architecture. System Calls - interface syscall.c trap.c static int (*syscalls[])(void) = { [•SYS_forkCalls] sys_forkare usually, made with C/C++ libraryVoid trap(struct functions: trapframe* tf) [SYS_exit] sys_exit, { . . . User Application C - Library Kernel System Call . . [SYS_close] sys_close, . }; getpid() Load arguments, eax ← _NR_getpid, if(tf->trapno == T_SYSCALL){ if(myproc()->killed) void syscall(void) { kernel mode (int 80) int num; Call exit (); struct proc *curproc = myproc(); Sys_Call_table[eax] myproc()->tf =sys_getpid() tf; syscall(); num = curproc->tf->eax; if(myproc()->killed) exit(); if(num > 0 && num < NELEM(syscalls) && syscall_exit return syscalls[num]) { return;

curproc->tf->eax = syscalls[num](); } } else { resume_userspace . cprintf("%d %s: unknown sys call %d\n", . curproc->pid, curproc->name, num); return curproc->tf->eax = -1; . } } User-Space Kernel-Space

Remark: Invoking int 0x80 is common although newer techniques for “faster” control transfer are provided by both AMD’s and Intel’s architecture. System Calls - interface

• Calls are usually made with C/C++ library functions: trapasm.S User Application C - Library Kernel System Call . . getpid() Load arguments, . eax ← _NR_getpid, addl $4, %esp kernel mode (int 80) Call sys_getpid() # Return falls through to trapret... Sys_Call_table[eax] .globl trapret trapret: return popal syscall_exit popl %gs popl %fs resume_userspace popl %es popl %ds return addl $0x8, %esp # trapno and errcode iret User-Space Kernel-Space

Remark: Invoking int 0x80 is common although newer techniques for “faster” control transfer are provided by both AMD’s and Intel’s architecture. System Calls – tips

• Kernel behavior can be enhanced by altering the system calls themselves: imagine we wish to write a message (or add a log entry) whenever a specific user is opening a file. We can re-write the system call open with our new open function and load it to the kernel (need administrative rights). Now all “open” requests are passed through our function. • We can examine which system calls are made by a program by invoking strace. Process Control

Kernel Space

Proc 1 Proc 2 Proc 3 Proc 4 (running) (sleep) (ready) (ready) Process Control Block proc.h enum procstate { UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE };

// Per-process state struct proc { uint sz; // Size of process memory (bytes) pde_t* pgdir; // Page table char *kstack; // Bottom of kernel stack for this process enum procstate state; // Process state int pid; // Process ID struct proc *parent; // Parent process struct trapframe *tf; // Trap frame for current syscall struct context *context; // swtch() here to run process void *chan; // If non-zero, sleeping on chan int killed; // If non-zero, have been killed struct file *ofile[NOFILE]; // Open files struct inode *cwd; // Current char name[16]; // Process name (debugging) }; THE KILL SYSTEM CALL (XV6) /*** sysproc.c ***/ int sys_kill(void) { int pid; /*** syscall.c ***/ if(argint(0, &pid) < 0) return -1; static int (*syscalls[])(void) = { return kill(pid); [SYS_chdir] sys_chdir, } [SYS_close] sys_close, [SYS_dup] sys_dup, [SYS_exec] sys_exec, /*** proc.c ***/ [SYS_exit] sys_exit, [SYS_fork] sys_fork, // Kill the process with the given pid. [SYS_fstat] sys_fstat, // Process won't exit until it returns [SYS_getpid] sys_getpid, // to user space (see trap in trap.c). [SYS_kill] sys_kill, int [SYS_link] sys_link, kill(int pid) { [SYS_mkdir] sys_mkdir, struct proc *p; [SYS_mknod] sys_mknod, [SYS_open] sys_open, acquire(&ptable.lock); [SYS_pipe] sys_pipe, for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){ [SYS_read] sys_read, if(p->pid == pid){ [SYS_sbrk] sys_sbrk, p->killed = 1; [SYS_sleep] sys_sleep, // Wake process from sleep if necessary. [SYS_unlink] sys_unlink, if(p->state == SLEEPING) [SYS_wait] sys_wait, p->state = RUNNABLE; [SYS_write] sys_write, release(&ptable.lock); [SYS_uptime] sys_uptime, return 0; }; } release(&ptable.lock); return -1; } 19 Fork

• pid_t fork(void); • Fork is used to create a new process. It creates a duplicate of the original process (including all file descriptors, registers, instruction pointer, etc’). • Once the call is finished, the process and its copy go their separate ways. Subsequent changes to one should not effect the other. • The fork call returns a different value to the original process (parent) and its copy (child): in the child process this value is zero, and in the parent process it is the PID of the child process. • When fork is invoked the parent’s information should be copied to its child – however, this can be wasteful if the child will not need this information (see exec()…). To avoid such situations, Copy On Write (COW) is used for the data section. Copy On Write (COW)

• How does manage COW?

fork() Parent Child Process Process DATA DATA STRUCTURE STRUCTURE (task_struct) (task_struct) write RW RWRO information

protection fault! Copying is expensive. Well, no other The child process will choice but to point to the parent’s allocate a new RW pages copy of each required page Process control

An example: Program flow: int i = 3472; printf("my process pid is %d\n",getpid()); PID = 8864 fork_id=fork(); if (fork_id==0){ i = 3472 i= 6794; printf(“child pid %d, i=%d\n",getpid(),i); } fork () else printf(“parent pid %d, i=%d\n",getpid(),i); return 0; PID = 8865 fork_id=0 fork_id = 8865 i = 6794 Output: i=3472 my process pid is 8864 child pid 8865, i=6794 parent pid 8864, i=3472 Process control

• Exit – A process can terminate itself using the exit system call – The call for the exit can be either explicit or implicit – The exit system call receives a single integer argument that will be the exit status of the process Process control - zombies

• When a process ends, the memory and resources associated with it are deallocated. • However, the entry for that process is not removed from the process table. • This allows the parent to collect the child’s exit status. • When this data is not collected by the parent the child is called a “zombie”. Such a leak is usually not worrisome in itself, however, it is a good indicator for problems to come. Process control

• Wait • pid_t wait(int *status); • pid_t waitpid(pid_t pid, int *status, int options); • The wait command is used for waiting on child processes whose state changed (the process terminated, for example). • The process calling wait will suspend execution until one of its children (or a specific one) terminates. • Waiting can be done for a specific process, a group of processes or on any arbitrary child with waitpid. • Once the status of a zombie process is collected that process is removed from the process table by the collecting process. Process control

• exec* • int execv(const char *, char *const argv[]); • int execvp(const char *file, char *const argv[]); • exec…. • The exec() family of function replaces current process image with a new process image (text, data, bss, stack, etc). • Since no new process is created, PID remains the same. • Exec functions do not return to the calling process unless an error occurred (in which case -1 is returned and errno is set with a special value). • The system call is execve(…) Process control – simple shell

#define… … int main(int argc, char **argv){ … while(true){ type_prompt(); read_command(command, params); pid=fork(); if (pid<0){ if (errno==EAGAIN) printf(“ERROR cannot allocate sufficient memory\n”); continue; } if (pid>0) wait(&status); else execvp(command,params); } Other system calls

• File Management: – open – close – read – write – lseek – etc… File management

• In POSIX operating systems files are accessed via a (Microsoft Windows uses a slightly different object: file handle). • A file descriptor is an integer specifying the index of an entry in the file descriptor table held by each process. • A file descriptor table is held by each process, and contains details of all open files. The following is an example of such a table:

FD Name Other information 0 Standard Input (stdin) … 1 Standard Output (stdout) …

2 Standard Error (stderr) … • File descriptors can refer to files, directories, sockets and a few more data objects. File management

• Open • int open(const char *pathname, int flags, mode_t mode); • Open returns a file descriptor for a given pathname. • This file descriptor will be used in subsequent system calls (according to the flags and mode) • Flags define the access mode: O_RDONLY (read only), O_WRONLY (write only), O_RDRW (read write). These can be bit-wised or’ed with more creation and status flags such as O_APPEND, O_TRUNC, O_CREAT. • Close • Int close(int fd); • Closes a file descriptor so it no longer refers to a file. • Returns 0 on success or -1 in case of failure (errno is set). File management

• Read • ssize_t read(int fd, void *buf, size_t count); • Attempts to read up to count bytes from the file descriptor fd, into the buffer buf. • Returns the number of bytes actually read (can be less than requested if read was interrupted by a , close to EOF, reading from pipe or terminal). • On error -1 is returned (and errno is set). • Note: The file position advances according to the number of bytes read. • Write • ssize_t write(int fd, const void *buf, size_t count); • Writes up to count bytes to the file referenced to by fd, from the buffer positioned at buf. • Returns the number of bytes actually wrote, or -1 (and errno) on error. File management

• lseek • off_t lseek(int fd, off_t offset, int whence); • This function repositions the offset of the file position of the file associated with fd to the argument offset according to the directive whence. • Whence can be set to SEEK_SET (directly to offset), SEEK_CUR (current+offset), SEEK_END (end+offset). • Positioning the offset beyond file end is allowed. This does not change the size of the file. • Writing to a file beyond its end results in a “hole” filled with ‘\0’ characters (null bytes). • Returns the location as measured in bytes from the beginning of the file, or -1 in case of error (and set errno). File management

• Dup • int dup(int oldfd); • int dup2(int oldfd, int newfd); • The dup commands create a copy of the file descriptor oldfd. • After a successful dup command is executed the old and new file descriptors may be used interchangeably. • They refer to the same open file descriptions and thus share information such as offset and status. That means that using lseek on one will also affect the other! • They do not share descriptor flags (FD_CLOEXEC). • Dup uses the lowest numbered unused file descriptor, and dup2 uses newfd (closing current newfd if necessary). • Returns the new file descriptor, or -1 in case of an error (and set errno). File management

Consider the following As a result (abstract): example: 0 stdin … 1 stdout … fileFD= open(“file.txt”…); 2 stderr … 3 file.txt … close(1); /* closes file handle 1, which is stdout.*/ fd =dup(fileFD); /* will create another file handle. File handle 1 is free, so it will be allocated. */ 0 stdin … close(fileFD); /* don’t need this descriptor 1 file.txt … anymore.*/ 2 stderr …

printf(“this did not go to stdout”); File management - example

#define… … #define RW_BLOCK 10

int main(int argc, char **argv){ int fdsrc, fddst; ssize_t readBytes, wroteBytes; char *buf[RW_BLOCK]; char *source = argv[1]; char *dest = argv[2];

fdsrc=open(source,O_RDONLY); if (fdsrc<0){ perror("ERROR while trying to open source file:"); exit(-1); } fddst=open(dest,O_RDWR|O_CREAT|O_TRUNC, 0666); if (fddst<0){ perror("ERROR while trying to open destination file:"); exit(-2); } File management - example

lseek(fddst,20,SEEK_SET); do{ readBytes=read(fdsrc, buf, RW_BLOCK); if (readBytes<0){ if (errno == EIO){ printf("I/O errors detected, aborting.\n"); exit(-10); } exit (-11); } wroteBytes=write(fddst, buf, readBytes); if (wroteBytes0); lseek(fddst,0,SEEK_SET); write(fddst,"\\*WRITE START*\\\n",16); close(fddst); close(fdsrc); return 0; } Fork – example (1)

How many lines of “Hello” will be printed in the following example:

int main(int argc, char **argv){ int i; for (i=0; i<10; i++){ fork(); printf(“Hello \n”); } return 0; } Fork – example (1)

How many lines of “Hello” will be printed Program flow: in the following example:

int main(int argc, char **argv){ int i; i=0

for (i=0; i<10; i++){

fork(); i=1 printf(“Hello \n”); } return 0; i=2 }

Total number of printf calls: Fork – example (2)

How many lines of “Hello” will be printed in the following example:

int main(int argc, char **argv){ int i; for (i=0; i<10; i++){ printf(“Hello \n”); fork(); } return 0; } Fork – example (2)

How many lines of “Hello” will be printed Program flow: in the following example: i=0 int main(int argc, char **argv){

int i; i=1

for (i=0; i<10; i++){

printf(“Hello \n”); i=2 fork(); } return 0; }

Total number of printf calls: Fork – example (3)

How many lines of “Hello” will be printed in the following example:

int main(int argc, char **argv){ int i; for (i=0; i<10; i++) fork(); printf(“Hello \n”); return 0; } Fork – example (3)

How many lines of “Hello” will be printed Program flow: in the following example:

int main(int argc, char **argv){ int i; i=0

for (i=0; i<10; i++)

fork(); i=1 printf(“Hello \n”); return 0; } i=2

Total number of printf calls: