Flsched: a Lockless and Lightweight Approach to OS Scheduler for Xeon

Total Page:16

File Type:pdf, Size:1020Kb

Flsched: a Lockless and Lightweight Approach to OS Scheduler for Xeon FLSCHED: A Lockless and Lightweight Approach to OS Scheduler for Xeon Phi Heeseung Jo Woonhak Kang Chonbuk National University Georgia Institute of Technology 567 Baekje-daero 266 Ferst Dr Jeonju, Jeollabuk 54896 Atlanta, GA 30313 [email protected] [email protected] Changwoo Min Taesoo Kim Virginia Tech Georgia Institute of Technology 302 Whittemore 266 Ferst Dr Blacksburg, VA 24060 Atlanta, GA 30313 [email protected] [email protected] ABSTRACT accelerators. For example, a single Xeon processor has up to Processor manufacturers have increased the number of cores 24 physical cores or 48 hardware threads [14], and a Xeon in a chip, and the latest manycore processor has up to 76 phys- Phi processor has up to 76 physical cores or 304 hardware ical cores and 304 hardware threads. On the other hand, the threads [22, 24]. In addition, due to increasingly important revolution of OS schedulers to manage processes in systems machine learning workloads, which are compute-intensive is slow to follow up emerging manycore processors. and massively parallel, we expect that the core count per In this paper, we show how much CFS, the default Linux system will increase further. scheduler, can break the performance of parallel applications The prevalence of manycore processors imposes new chal- on manycore processors (e.g., Intel Xeon Phi). Then, we pro- lenges in scheduler design. First, schedulers should be able to pose a novel scheduler named FLSCHED, which is designed handle the unprecedented high degree of parallelism. When for lockless implementation with less context switches and the CFS scheduler was introduced, quad-core servers were more efficient scheduling decisions. In our evaluations on dominant in data centers. Now, 32-core servers are standard Xeon Phi, FLSCHED shows better performance than CFS up in data centers [18]. Moreover, servers with more than 100 to 1.73× for HPC applications and 3.12× for micro-benchmarks. cores are becoming popular [2]. Under such a high degree of parallelism, a small sequential part in a system can break ACM Reference format: the performance and scalability of an application. Amdahl’s Heeseung Jo, Woonhak Kang, Changwoo Min, and Taesoo Kim. Law says that if a sequential part in an entire system increases 2017. FLSCHED: A Lockless and Lightweight Approach to OS Scheduler for Xeon Phi. In Proceedings of APSys ’17, Mumbai, from 1% to 2%, then we end up with significantly decreased India, September 2, 2017, 8 pages. maximum speed up from 50 times to 33 times. In particular, https://doi.org/10.1145/3124680.3124724 schedulers in the Linux kernel use various lock primitives, such as spinlock, mutex, and read-write semaphore, to pro- 1 INTRODUCTION tect their data structures (see Table 1). We found that those sequential parts in schedulers protected by locks significantly Manycore processors are now prevalent in all types of comput- degrade the performance of massively parallel applications. ing devices, including mobile devices, servers, and hardware The performance degradation becomes especially significant Permission to make digital or hard copies of all or part of this work for in communication-intensive applications, which need sched- personal or classroom use is granted without fee provided that copies are not uler intervention (see Figure 1 and Figure 2). made or distributed for profit or commercial advantage and that copies bear Second, the cost of context switching keeps increasing this notice and the full citation on the first page. Copyrights for components as the amount of context, which needs to be saved and re- of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to stored, increasing. In Intel architectures, the width of SIMD redistribute to lists, requires prior specific permission and/or a fee. Request register file has increased from 128-bits to 256-bits andnow permissions from [email protected]. to 512-bits in XMM, YMM, and AVX, respectively [19]. APSys ’17, September 2, 2017, Mumbai, India This problem becomes exaggerated when the limited mem- © 2017 Association for Computing Machinery. ory bandwidth is shared among many CPU cores with small ACM ISBN 978-1-4503-5197-3/17/09. $15.00 cache such as Xeon Phi processors. Recent schedulers adopt https://doi.org/10.1145/3124680.3124724 lazy optimization techniques, which do not save unchanged 2.0 CFS FIFO RR FL register files, to reduce the context switching overhead8 [ , 27]. However, there is no way but paying high cost for compute- 1.5 intensive applications, which heavily rely on SIMD operations 1.0 for better performance. 0.5 In this paper, we present FLSCHED—a new process sched- Normalized OPS uler to address the aforementioned problems. FLSCHED is 0.0 designed for manycore accelerators like Xeon Phi. We adopt bt cg ep ft is mg sp ua a lockless design to keep FLSCHED from becoming a se- Programs in NPB quential bottleneck. This is particularly critical for manycore Figure 1: Performance comparison of NPB benchmarks run- accelerators, which have a large number of CPU cores; the ning 1,600 threads on a Xeon Phi with different process sched- Xeon Phi processor, which we used for experiment in this ulers. Performance (OPS: operations per second) is normalized paper, has 57 cores or 228 hardware threads. FLSCHED is to that of CFS. also designed for minimizing the number of context switches. Because a Xeon Phi processor has 2× larger vector registers 20.0 60.0M than Xeon processors (i.e., 32 512-bit registers for Xeon Phi CFS 18.0 FIFO 50.0M 16.0 RR and 16 512-bits registers for Xeon processor), and its per-core FL 14.0 40.0M memory bandwidth and cache size are smaller than a Xeon 12.0 10.0 30.0M processor, its overhead of context switching is higher than a 8.0 20.0M 6.0 Xeon processor [9]. Thus, it is critical to minimize the number Context switches Execution time (s) 4.0 10.0M of context switching as many as possible. Finally, FLSCHED 2.0 0.0 0.0 is tailored to throughput-oriented workloads, which are domi- 20 40 60 80 100120140160180200 20 40 60 80 100120140160180200 nant in manycore accelerators. Number of groups (x40 tasks) Number of groups (x40 tasks) This paper makes following three contributions: Figure 2: Execution time and number of context switches of ∙ We evaluate how the widely-used Linux schedulers hackbench on a Xeon Phi. We used the thread test mode with (i.e., CFS, FIFO, and RR) perform in a manycore ac- increasing the number of groups. Each group has 20 senders celerator. We analyze their behavior especially in terms and 20 receivers communicating via pipe. of spinlock contention, which will increase a sequen- tial portion in a scheduler, and the number of context switching. FIFO, and RR, with high performance computing (HPC) ap- ∙ We design a new processor scheduler, named FLSCHED, plications and a communication-intensive micro-benchmark. which is tailored for minimizing the number of context We first measured the performance of NAS Parallel Bench- switching in a lockless fashion. mark (NPB) [1], which is written in OpenMP, running on the ∙ We show the effectiveness of FLSCHED for real-world Xeon Phi. In particular, we ran eight NPB benchmarks, which OpenMP applications and micro-benchmarks. In partic- fit in the Xeon Phi memory, and measured the operation per ular, FLSCHED outperforms all other Linux schedulers second (OPS). As Figure 1 shows, there is no clear winner for NAS Parallel Benchmark (NPB) by up to 73%. among CFS, FIFO, and RR: for five benchmarks, FIFO and The rest of this paper is organized as follows: §2 provides RR are better than CFS; for the other three benchmarks, CFS the motivation of our work with a case study, and §3 de- is better than FIFO and RR. In contrast, FLSCHED shows bet- scribes FLSCHED’s design in detail. §4 evaluates and analy- ter performance for all benchmarks, except for is. Especially, ses FLSCHED’s performance. §5 compares FLSCHED with four benchmarks (i.e., cg, mg, sp, and ua) show significantly previous research, and §6 concludes the paper. better performance up to 1.73 times. For the analysis of sched- uler behavior, we ran the perf profiling tool while running 2 CASE STUDY ON XEON PHI the benchmarks. We found that spinlock contention in the In this section, we analyze how existing schedulers in the schedulers could become a major scalability bottleneck. As Linux kernel perform on manycore processors, especially a Table 2 shows, for the four benchmarks, which FLSCHED Xeon Phi processor. A lot of research efforts have been made shows significant higher performance, CFS, FIFO, andRR to make OS scalable to fully utilize manycore processors [3– spend significantly longer time for spinlock contention in 7, 12, 23]. Our focus in this paper is to investigate whether schedulers (i.e., around 8-15%) than FLSCHED (i.e., around 3- existing schedulers in the Linux kernel are efficient and scal- 5%). This shows that the increased sequential portion caused able enough to manage manycore processors. To this end, we by lock contention in schedulers can significantly deteriorate evaluated performance of three widely-used schedulers, CFS, the performance and scalability of applications. 2 To see how the number of context switching affects perfor- Lock types CORE CFS FIFO/RR FL mance, we ran hackbench [11], which is a scheduler bench- raw_spin_lock 16 1 12 - mark. We set the configuration of hackbench to use threads raw_spin_lock_irq/irqsave 13 5 2 - rcu_read_lock 14 5 1 - for parallelism and pipe for communication among threads. spin_lock - - - - Figure 2 shows the execution time and the number of con- spin_lock_irq/irqsave 12 - - - read_lock 3 - - - text switching on the Xeon Phi.
Recommended publications
  • Demystifying the Real-Time Linux Scheduling Latency
    Demystifying the Real-Time Linux Scheduling Latency Daniel Bristot de Oliveira Red Hat, Italy [email protected] Daniel Casini Scuola Superiore Sant’Anna, Italy [email protected] Rômulo Silva de Oliveira Universidade Federal de Santa Catarina, Brazil [email protected] Tommaso Cucinotta Scuola Superiore Sant’Anna, Italy [email protected] Abstract Linux has become a viable operating system for many real-time workloads. However, the black-box approach adopted by cyclictest, the tool used to evaluate the main real-time metric of the kernel, the scheduling latency, along with the absence of a theoretically-sound description of the in-kernel behavior, sheds some doubts about Linux meriting the real-time adjective. Aiming at clarifying the PREEMPT_RT Linux scheduling latency, this paper leverages the Thread Synchronization Model of Linux to derive a set of properties and rules defining the Linux kernel behavior from a scheduling perspective. These rules are then leveraged to derive a sound bound to the scheduling latency, considering all the sources of delays occurring in all possible sequences of synchronization events in the kernel. This paper also presents a tracing method, efficient in time and memory overheads, to observe the kernel events needed to define the variables used in the analysis. This results in an easy-to-use tool for deriving reliable scheduling latency bounds that can be used in practice. Finally, an experimental analysis compares the cyclictest and the proposed tool, showing that the proposed method can find sound bounds faster with acceptable overheads. 2012 ACM Subject Classification Computer systems organization → Real-time operating systems Keywords and phrases Real-time operating systems, Linux kernel, PREEMPT_RT, Scheduling latency Digital Object Identifier 10.4230/LIPIcs.ECRTS.2020.9 Supplementary Material ECRTS 2020 Artifact Evaluation approved artifact available at https://doi.org/10.4230/DARTS.6.1.3.
    [Show full text]
  • Context Switch in Linux – OS Course Memory Layout – General Picture
    ContextContext switchswitch inin LinuxLinux ©Gabriel Kliot, Technion 1 Context switch in Linux – OS course Memory layout – general picture Stack Stack Stack Process X user memory Process Y user memory Process Z user memory Stack Stack Stack tss->esp0 TSS of CPU i task_struct task_struct task_struct Process X kernel Process Y kernel Process Z kernel stack stack and task_struct stack and task_struct and task_struct Kernel memory ©Gabriel Kliot, Technion 2 Context switch in Linux – OS course #1 – kernel stack after any system call, before context switch prev ss User Stack esp eflags cs … User Code eip TSS … orig_eax … tss->esp0 es Schedule() function frame esp ds eax Saved on the kernel stack during ebp a transition to task_struct kernel mode by a edi jump to interrupt and by SAVE_ALL esi macro edx thread.esp0 ecx ebx ©Gabriel Kliot, Technion 3 Context switch in Linux – OS course #2 – stack of prev before switch_to macro in schedule() func prev … Schedule() saved EAX, ECX, EDX Arguments to contex_switch() Return address to schedule() TSS Old (schedule’s()) EBP … tss->esp0 esp task_struct thread.eip thread.esp thread.esp0 ©Gabriel Kliot, Technion 4 Context switch in Linux – OS course #3 – switch_to: save esi, edi, ebp on the stack of prev prev … Schedule() saved EAX, ECX, EDX Arguments to contex_switch() Return address to schedule() TSS Old (schedule’s()) EBP … tss->esp0 ESI EDI EBP esp task_struct thread.eip thread.esp thread.esp0 ©Gabriel Kliot, Technion 5 Context switch in Linux – OS course #4 – switch_to: save esp in prev->thread.esp
    [Show full text]
  • What Is an Operating System III 2.1 Compnents II an Operating System
    Page 1 of 6 What is an Operating System III 2.1 Compnents II An operating system (OS) is software that manages computer hardware and software resources and provides common services for computer programs. The operating system is an essential component of the system software in a computer system. Application programs usually require an operating system to function. Memory management Among other things, a multiprogramming operating system kernel must be responsible for managing all system memory which is currently in use by programs. This ensures that a program does not interfere with memory already in use by another program. Since programs time share, each program must have independent access to memory. Cooperative memory management, used by many early operating systems, assumes that all programs make voluntary use of the kernel's memory manager, and do not exceed their allocated memory. This system of memory management is almost never seen any more, since programs often contain bugs which can cause them to exceed their allocated memory. If a program fails, it may cause memory used by one or more other programs to be affected or overwritten. Malicious programs or viruses may purposefully alter another program's memory, or may affect the operation of the operating system itself. With cooperative memory management, it takes only one misbehaved program to crash the system. Memory protection enables the kernel to limit a process' access to the computer's memory. Various methods of memory protection exist, including memory segmentation and paging. All methods require some level of hardware support (such as the 80286 MMU), which doesn't exist in all computers.
    [Show full text]
  • Hiding Process Memory Via Anti-Forensic Techniques
    DIGITAL FORENSIC RESEARCH CONFERENCE Hiding Process Memory via Anti-Forensic Techniques By: Frank Block (Friedrich-Alexander Universität Erlangen-Nürnberg (FAU) and ERNW Research GmbH) and Ralph Palutke (Friedrich-Alexander Universität Erlangen-Nürnberg) From the proceedings of The Digital Forensic Research Conference DFRWS USA 2020 July 20 - 24, 2020 DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics research. Ever since it organized the first open workshop devoted to digital forensics in 2001, DFRWS continues to bring academics and practitioners together in an informal environment. As a non-profit, volunteer organization, DFRWS sponsors technical working groups, annual conferences and challenges to help drive the direction of research and development. https://dfrws.org Forensic Science International: Digital Investigation 33 (2020) 301012 Contents lists available at ScienceDirect Forensic Science International: Digital Investigation journal homepage: www.elsevier.com/locate/fsidi DFRWS 2020 USA d Proceedings of the Twentieth Annual DFRWS USA Hiding Process Memory Via Anti-Forensic Techniques Ralph Palutke a, **, 1, Frank Block a, b, *, 1, Patrick Reichenberger a, Dominik Stripeika a a Friedrich-Alexander Universitat€ Erlangen-Nürnberg (FAU), Germany b ERNW Research GmbH, Heidelberg, Germany article info abstract Article history: Nowadays, security practitioners typically use memory acquisition or live forensics to detect and analyze sophisticated malware samples. Subsequently, malware authors began to incorporate anti-forensic techniques that subvert the analysis process by hiding malicious memory areas. Those techniques Keywords: typically modify characteristics, such as access permissions, or place malicious data near legitimate one, Memory subversion in order to prevent the memory from being identified by analysis tools while still remaining accessible.
    [Show full text]
  • Context Switch Overheads for Linux on ARM Platforms
    Context Switch Overheads for Linux on ARM Platforms Francis David [email protected] Jeffrey Carlyle [email protected] Roy Campbell [email protected] http://choices.cs.uiuc.edu Outline What is a Context Switch? ◦ Overheads ◦ Context switching in Linux Interrupt Handling Overheads ARM Experimentation Platform Context Switching ◦ Experiment Setup ◦ Results Interrupt Handling ◦ Experiment Setup ◦ Results What is a Context Switch? Storing current processor state and restoring another Mechanism used for multi-tasking User-Kernel transition is only a processor mode switch and is not a context switch Sources of Overhead Time spent in saving and restoring processor state Pollution of processor caches Switching between different processes ◦ Virtual memory maps need to be switched ◦ Synchronization of memory caches Paging Context Switching in Linux Context switch can be implemented in userspace or in kernelspace New 2.6 kernels use Native POSIX Threading Library (NPTL) NPTL uses one-to-one mapping of userspace threads to kernel threads Our experiments use kernel 2.6.20 Outline What is a Context Switch? ◦ Overheads ◦ Context switching in Linux Interrupt Handling Overheads ARM Experimentation Platform Context Switching ◦ Experiment Setup ◦ Results Interrupt Handling ◦ Experiment Setup ◦ Results Interrupt Handling Interruption of normal program flow Virtual memory maps not switched Also causes overheads ◦ Save and restore of processor state ◦ Perturbation of processor caches Outline What is a Context Switch? ◦ Overheads ◦ Context switching
    [Show full text]
  • Lecture 4: September 13 4.1 Process State
    CMPSCI 377 Operating Systems Fall 2012 Lecture 4: September 13 Lecturer: Prashant Shenoy TA: Sean Barker & Demetre Lavigne 4.1 Process State 4.1.1 Process A process is a dynamic instance of a computer program that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently. A computer program itself is just a passive collection of instructions, while a process is the actual execution of those instructions. Several processes may be associated with the same program; for example, opening up several windows of the same program typically means more than one process is being executed. The state of a process consists of - code for the running program (text segment), its static data, its heap and the heap pointer (HP) where dynamic data is kept, program counter (PC), stack and the stack pointer (SP), value of CPU registers, set of OS resources in use (list of open files etc.), and the current process execution state (new, ready, running etc.). Some state may be stored in registers, such as the program counter. 4.1.2 Process Execution States Processes go through various process states which determine how the process is handled by the operating system kernel. The specific implementations of these states vary in different operating systems, and the names of these states are not standardised, but the general high-level functionality is the same. When a process is first started/created, it is in new state. It needs to wait for the process scheduler (of the operating system) to set its status to "new" and load it into main memory from secondary storage device (such as a hard disk or a CD-ROM).
    [Show full text]
  • IBM Power Systems Virtualization Operation Management for SAP Applications
    Front cover IBM Power Systems Virtualization Operation Management for SAP Applications Dino Quintero Enrico Joedecke Katharina Probst Andreas Schauberer Redpaper IBM Redbooks IBM Power Systems Virtualization Operation Management for SAP Applications March 2020 REDP-5579-00 Note: Before using this information and the product it supports, read the information in “Notices” on page v. First Edition (March 2020) This edition applies to the following products: Red Hat Enterprise Linux 7.6 Red Hat Virtualization 4.2 SUSE Linux SLES 12 SP3 HMC V9 R1.920.0 Novalink 1.0.0.10 ipmitool V1.8.18 © Copyright International Business Machines Corporation 2020. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . .v Trademarks . vi Preface . vii Authors. vii Now you can become a published author, too! . viii Comments welcome. viii Stay connected to IBM Redbooks . ix Chapter 1. Introduction. 1 1.1 Preface . 2 Chapter 2. Server virtualization . 3 2.1 Introduction . 4 2.2 Server and hypervisor options . 4 2.2.1 Power Systems models that support PowerVM versus KVM . 4 2.2.2 Overview of POWER8 and POWER9 processor-based hardware models. 4 2.2.3 Comparison of PowerVM and KVM / RHV . 7 2.3 Hypervisors . 8 2.3.1 Introducing IBM PowerVM . 8 2.3.2 Kernel-based virtual machine introduction . 15 2.3.3 Resource overcommitment . 16 2.3.4 Red Hat Virtualization . 17 Chapter 3. IBM PowerVM management and operations . 19 3.1 Shared processor logical partitions . 20 3.1.1 Configuring a shared processor LPAR .
    [Show full text]
  • Lecture 6: September 20 6.1 Threads
    CMPSCI 377 Operating Systems Fall 2012 Lecture 6: September 20 Lecturer: Prashant Shenoy TA: Sean Barker & Demetre Lavigne 6.1 Threads A thread is a sequential execution stream within a process. This means that a single process may be broken up into multiple threads. Each thread has its own Program Counter, registers, and stack, but they all share the same address space within the process. The primary benefit of such an approach is that a process can be split up into many threads of control, allowing for concurrency since some parts of the process to make progress while others are busy. Sharing an address space within a process allows for simpler coordination than message passing or shared memory. Threads can also be used to modularize a process { a process like a web browser may create one thread for each browsing tab, or create threads to run external plugins. 6.1.1 Processes vs threads One might argue that in general processes are more flexible than threads. For example, processes are controlled independently by the OS, meaning that if one crashes it will not affect other processes. However, processes require explicit communication using either message passing or shared memory which may add overhead since it requires support from the OS kernel. Using threads within a process allows them all to share the same address space, simplifying communication between threads. However, threads have their own problems: because they communicate through shared memory they must run on same machine and care must be taken to create thread-safe code that functions correctly despite multiple threads acting on the same set of shared data.
    [Show full text]
  • Mitigating the Performance-Efficiency Tradeoff in Resilient Memory Disaggregation
    Mitigating the Performance-Efficiency Tradeoff in Resilient Memory Disaggregation Youngmoon Lee∗, Hasan Al Maruf∗, Mosharaf Chowdhury∗, Asaf Cidonx, Kang G. Shin∗ University of Michigan∗ Columbia Universityx ABSTRACT State-of-the-art solutions take three primary approaches: (i) local We present the design and implementation of a low-latency, low- disk backup [36, 66], (ii) remote in-memory replication [31, 50], and overhead, and highly available resilient disaggregated cluster mem- (iii) remote in-memory erasure coding [61, 64, 70, 73] and compres- ory. Our proposed framework can access erasure-coded remote sion [45]. Unfortunately, they suffer from some combinations of memory within a single-digit `s read/write latency, significantly the following problems. improving the performance-efficiency tradeoff over the state-of- High latency: The first approach has no additional memory the-art – it performs similar to in-memory replication with 1.6× overhead, but the access latency is intolerably high in the presence lower memory overhead. We also propose a novel coding group of any of the aforementioned failure scenarios. The systems that placement algorithm for erasure-coded data, that provides load bal- take the third approach do not meet the single-digit `s latency ancing while reducing the probability of data loss under correlated requirement of disaggregated cluster memory even when paired failures by an order of magnitude. with RDMA (Figure 1). High cost: While the second approach has low latency, it dou- bles memory consumption as well as network bandwidth require- ments. The first and second approaches represent the two extreme 1 INTRODUCTION points in the performance-vs-efficiency tradeoff space for resilient To address the increasing memory pressure in datacenters, two cluster memory (Figure 1).
    [Show full text]
  • Virtual Memory and Linux
    Virtual Memory and Linux Matt Porter Embedded Linux Conference Europe October 13, 2016 About the original author, Alan Ott ● Unfortunately, he is unable to be here at ELCE 2016. ● Veteran embedded systems and Linux developer ● Linux Architect at SoftIron – 64-bit ARM servers and data center appliances – Hardware company, strong on software – Overdrive 3000, more products in process Physical Memory Single Address Space ● Simple systems have a single address space ● Memory and peripherals share – Memory is mapped to one part – Peripherals are mapped to another ● All processes and OS share the same memory space – No memory protection! – Processes can stomp one another – User space can stomp kernel mem! Single Address Space ● CPUs with single address space ● 8086-80206 ● ARM Cortex-M ● 8- and 16-bit PIC ● AVR ● SH-1, SH-2 ● Most 8- and 16-bit systems x86 Physical Memory Map ● Lots of Legacy ● RAM is split (DOS Area and Extended) ● Hardware mapped between RAM areas. ● High and Extended accessed differently Limitations ● Portable C programs expect flat memory ● Multiple memory access methods limit portability ● Management is tricky ● Need to know or detect total RAM ● Need to keep processes separated ● No protection ● Rogue programs can corrupt the entire system Virtual Memory What is Virtual Memory? ● Virtual Memory is a system that uses an address mapping ● Maps virtual address space to physical address space – Maps virtual addresses to physical RAM – Maps virtual addresses to hardware devices ● PCI devices ● GPU RAM ● On-SoC IP blocks What is Virtual Memory? ● Advantages ● Each processes can have a different memory mapping – One process's RAM is inaccessible (and invisible) to other processes.
    [Show full text]
  • • Programs • Processes • Context Switching • Protected Mode Execution • Inter-Process Communication • Threads
    • Programs • Processes • Context Switching • Protected Mode Execution • Inter-process Communication • Threads 1 Running Dynamic Code • One basic function of an OS is to execute and manage code dynamically, e.g.: – A command issued at a command line terminal – An icon double clicked from the desktop – Jobs/tasks run as part of a batch system (MapReduce) • A process is the basic unit of a program in execution 2 How to Run a Program? • When you double-click on an .exe, how does the OS turn the file on disk into a process? • What information must the .exe file contain in order to run as a program? 3 Program Formats • Programs obey specific file formats – CP/M and DOS: COM executables (*.com) – DOS: MZ executables (*.exe) • Named after Mark Zbikowski, a DOS developer – Windows Portable Executable (PE, PE32+) (*.exe) • Modified version of Unix COFF executable format • PE files start with an MZ header. Why? – Unix/Linux: Executable and Linkable Format (ELF) – Mac OSX: Mach object file format (Mach-O) 4 test.c #include <stdio.h> int big_big_array[10 * 1024 * 1024]; char *a_string = "Hello, World!"; int a_var_with_value = 100; int main(void) { big_big_array[0] = 100; printf("%s\n", a_string); a_var_with_value += 20; printf("main is : %p\n", &main); return 0; } 5 ELF File Format • ELF Header – Contains compatibility info – Entry point of the executable code • Program header table – Lists all the segments in the file – Used to load and execute the program • Section header table – Used by the linker 6 ELF Header Format• Entry point of typedef struct
    [Show full text]
  • Automata-Based Formal Analysis and Verification of the Real-Time Linux Kernel
    Universidade Federal de Santa Catarina Scuola Superiore Sant'Anna Red Hat, Inc. Ph.D. in Automation and Systems Engineering Ph.D. in Emerging Digital Technologies - Embedded and Real-Time Systems R&D Platform - Core-Kernel - Real-Time Ph.D. Candidate: Daniel Bristot de Oliveira Ph.D. Supervisors: Tommaso Cucinotta Rômulo Silva de Oliveira Automata-based Formal Analysis and Verification of the Real-Time Linux Kernel Pisa 2020 ACKNOWLEDGEMENTS In the first place, I would like to thank my friends and family for supporting me during the development of this research. In particular, I would like to thank Alessandra de Oliveira for motivating me to pursue a Ph.D. degree since I was a teenager. But also for allowing me to use her computer, which was “my” first computer. I also would like to thank Bianca Cartacci, for the unconditional support in the hardest days of these last years. I would like to thank Red Hat, Inc. for allowing me to continue my academic research in concurrency with my professional life. I would also like to thank the pro- fessionals that motivated me to remain in the academy and supported this research, including Nelson Campaner, Steven Rostedt, Juri Lelli and Arnaldo Carvalho de Mello. In special, I would like to thank Clark Williams, not only for his effort in supporting my research at Red Hat but, mainly, for his adivises and personal support. I would like to thank my Ph.D. colleagues from UFSC and Sant’Anna, in special Karila Palma Silva, Marco Pagani, Paolo Pazzaglia and Daniel Casini, all the members of the Retis Lab, and the administrave staff of the TeCIP institute at Sant’Anna and the PPGEAS at UFSC for the support in the cotutela agreement, in special for Enio Snoeijer.
    [Show full text]