<<

ENABLING USER-MODE PROCESSES IN THE TARGET OS

FOR CSC 159 PRAGMATICS

A Project

Presented to the faculty of the Department of Science

California State University, Sacramento

Submitted in partial satisfaction of the requirements for the degree of

MASTER OF SCIENCE

in

Computer Science

by

Michael Bradley Colson

SPRING 2018

© 2018

Michael Bradley Colson

ALL RIGHTS RESERVED

ii

ENABLING USER-MODE PROCESSES IN THE TARGET OS

FOR CSC 159 OPERATING SYSTEM PRAGMATICS

A Project

by

Michael Bradley Colson

Approved by:

______, Committee Chair Dr. Weide Chang

______, Second Reader Dr. Yuan Cheng

______Date

iii

Student: Michael Bradley Colson

I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the project.

______, Graduate Coordinator ______Dr. Jinsong Ouyang Date

Department of

iv

Abstract

of

ENABLING USER-MODE PROCESSES IN THE TARGET OS

FOR CSC 159 OPERATING SYSTEM PRAGMATICS

by

Michael Bradley Colson

One of the main responsibilities of an Operating System (OS) is to ensure system stability and security. Most OSes utilize mechanisms built into the CPU hardware to prevent user-mode processes from altering the state of the system or accessing protected spaces and hardware I/O ports. A should only read from or write to memory addresses in its . In order to accomplish the above, the CPU is typically designed to operate in two or more modes with different levels. The OS typically runs in the mode with highest privilege level, which enables it to execute any instruction and gives it access to all the memory in the system. To improve system reliability, security and stability, applications usually run in the mode with the lowest privilege level.

At Sacramento State University, CSC 159 (Operating System Pragmatics) students learn to develop an OS using the SPEDE (System Programmer’s Educational

Development Environment) framework. SPEDE provides an environment to build an OS image and download it to a target system where it is executed as the local OS.

While SPEDE provides an excellent development environment for students to learn how v

to develop an OS, one limitation is it does not currently support running processes in

user-mode (only kernel-mode is supported). In order to give students a better

understanding of how an OS can provide security and protection, it is necessary to enable

the support for user-mode processes.

The goals of this project are to establish a supportive component for the creation of user-mode processes, enabling and enforcing for the process runtime and handling of runtime exceptions in a virtual system,

such as page faults and general protection faults in the OS kernel. In order to achieve

these goals, additional kernel data structures must be implemented (e.g. paging table

structures, a , modified process frame, etc.) along with handlers

for page faults and general protection faults.

______, Committee Chair Dr. Weide Chang

______Date

vi

ACKNOWLEDGEMENTS

I would like to thank the project sponsor, Dr. Weide Chang for his guidance and support on this project. Dr. Chang took time out of his busy schedule to provide me with valuable advice and feedback on all phases of the project. His knowledge of Operating

Systems and played a significant role in the successful completion of this project.

I would also like to thank Dr. Yuan Cheng for being the second reader for this project. His guidance and feedback were extremely helpful in completing all the requirements for the project in a timely manner.

I would also like to thank the entire CSUS Computer Science department and faculty for supporting me to become a successful student.

vii

TABLE OF CONTENTS Page

Acknowledgements ...... vii

List of Tables ...... xi

List of Figures ...... xii

Chapter

1. INTRODUCTION ...... 1

1.1 Project Overview ...... 1

1.2 Project Objectives ...... 2

1.3 Related Work ...... 3

1.4 Expected Results ...... 4

1.5 Background ...... 5

2. OVERVIEW OF THE CPU ARCHITECTURE ...... 9

2.1 /AMD x86 Architecture Overview ...... 9

2.2 Structures ...... 12

2.3 Task Switching ...... 14

2.4 and Address Translation ...... 18

2.5 Exceptions and ...... 21

3. OVERVIEW OF THE SPEDE X86 DEVELOPMENT ENVIRONMENT ...... 25

3.1 Overview of SPEDE ...... 25

3.2 FLASH – Flames ...... 30

3.3 Flint – Flames Interface ...... 31 viii

3.4 GNU ...... 32

3.5 Virtual SPEDE ...... 33

4. METHODOLOGY ...... 36

4.1 Creating User-Mode and Kernel-Mode Processes ...... 36

4.2 Dispatching User-Mode and Kernel-Mode Processes ...... 38

4.3 Memory Protection for User-Mode Processes ...... 39

4.4 Handling Exceptions and Faults Generated by User-Mode Processes ..... 39

5. IMPLEMENTATION ...... 41

5.1 Organization of the Target OS Source Code ...... 41

5.2 Kernel Data Structures Overview ...... 43

5.3 Kernel Data Structures to Enable User-Mode Processes ...... 44

5.4 Kernel Task State Segment Implementation ...... 46

5.5 API for Creating and Dispatching User-Mode Processes ...... 47

5.6 API for Creating and Dispatching Kernel-Mode Processes ...... 49

5.7 Handling Software Interrupts Generated by User-Mode Processes ...... 50

5.8 Page-Fault Handler Implementation ...... 50

5.9 General-Protection Fault Handler Implementation ...... 53

6. TESTING ...... 54

6.1 Testing Approach ...... 54

6.2 Test Cases ...... 56

7. RESULTS AND EVALUATION...... 60

8. CONCLUSION AND FUTURE WORK ...... 63 ix

Appendix A. SOURCE CODE ...... 65

References ...... 76

x

LIST OF TABLES Tables Page

1. Segment Descriptors Added to the GDT in the Target OS ...... 27

2. SPEDE Library Files...... 28

3. Target OS Source Files ...... 42

4. Target OS Kernel Data Structures ...... 43

5. Test Cases to Verify the Software Requirements ...... 56

xi

LIST OF FIGURES Figures Page

1. General-Purpose and Segment Registers in an x86 CPU ...... 12

2. Protection Rings in an x86 CPU ...... 14

3. 32- x86 Task State Segment ...... 15

4. Paging Structures in an x86 Computer ...... 20

5. Stack Contents after an Exception or ...... 24

6. FLASH Downloads the OS Image from the Host to the Target ...... 26

7. SPEDE Environment in the CSC 159 Lab ...... 30

8. GDB Invoked from Flash on the Host System ...... 33

9. Virtual SPEDE Environment ...... 34

10. Flowchart for Page-Fault Handler ...... 52

11. Flowchart for General-Protection Fault Handler ...... 53

12. State of the Register File after Switching to User-Mode ...... 61

xii

1

Chapter 1

INTRODUCTION

1.1 Project Overview

In CSC 159 (Operating System Pragmatics) at Sacramento State University, students learn how to write a simple OS from scratch. The purpose of the course is to give students a good foundation for how to design and implement an OS and to get a deeper understanding of OS internals. CSC 159 students learn how to implement algorithms and data structures for basic process management, synchronization mechanisms, paging, virtual memory, inter-process communication, a simulated and primitive device drivers for a printer and serial port. Early on in the course, students implement processes by creating small functions (e.g. UserProc()) which can be dispatched by the kernel. Mode switching is not currently implemented as part of the OS framework provided to the students so all processes (and the kernel) run in kernel-mode. This is not how a typical OS implements process creation and dispatching. Typically, an OS will create a process with access to user-mode segments and dispatch the process in user- mode (ring three).

Running processes in user-mode in the target OS will provide CSC 159 students with a more realistic example of how an OS can enable and enforce protection. Nearly all modern OSes utilize two or more different privilege levels provided by the CPU hardware and run applications in the mode with the lowest privilege level. Running processes in user-mode is essential for system reliability, stability and security. A user-

2

mode process can only execute non-privileged instructions. If it attempts to execute a privileged instruction, a general-protection fault will be generated, and control will be transferred to the OS. In most modern OSes, if a user-mode process attempts to execute a privileged instruction or access memory outside of its address space, the OS kills the process and informs the user of the reason for terminating the process.

As an example of why user-mode is essential for system stability, suppose a user- mode process could execute any instruction. This would allow a buggy or malicious process to halt the processor by executing the (halt) instruction or disable interrupts using the cli (clear ) instruction. Halting the CPU would put the system into a halted state until a reset or interrupt is asserted [1]. Disabling interrupts would prevent the OS from arbitrating resources among the running processes. Both cases would be catastrophic on enterprise-class server systems where availability and uptime is critical. Running processes in user-mode is also important for security because without it a process could read from or write to another processes’ memory or alter kernel data structures. Protecting memory from unauthorized reads or writes is facilitated by user-

mode and setting up page tables with the proper permissions.

1.2 Project Objectives

The primary objectives of this project are:

1) Use SPEDE to implement switching between kernel-mode and user-mode

(and vice-versa) in the target OS without any faults, exceptions or crashes.

3

2) Set up page tables for each user-mode process with the proper permissions to

enable and enforce memory protection.

3) protection faults by user-mode processes (e.g. illegal memory access

or an attempt to execute a privileged instruction) gracefully in the target OS

kernel using a page-fault handler and a general-protection fault handler.

1.3 Related Work

Some work on implementing user-mode processes in the target OS created by

SPEDE has been done already. The authors of the CSC 159 Lab Manual (Brian Witt and

Dr. John Clevenger) wrote an eight-page document on how user-mode processes can be implemented in the target OS [2]. In their document, they describe a mechanism to implement mode switching by first creating and populating a Task State Segment (TSS) for the kernel and the newly created user-mode process. The task switch is achieved by linking the kernel task’s TSS to the user-mode task’s TSS and executing the iret instruction to switch to the user-mode task. The Nested Task (NT) bit must be set in the

EFLAGS register in order to switch tasks in this way. This essentially tricks the CPU into thinking that the kernel task was called by the user-mode task and therefore must switch back to the user-mode task after returning from the kernel task.

To implement user-mode processes in the target OS, it is necessary to add at least two additional segment descriptors to the (GDT) for the user- mode process’s code, data and stack segments. One segment descriptor is needed for the user-mode , and one is needed for the user-mode data and stack segments.

4

The descriptor privilege level (DPL) of the user-mode segments must be set to three to indicate that they can be accessed by user-mode processes. In anticipation that user-mode processes would eventually be implemented in the target OS, the SPEDE developers added code to the target boot which automatically adds three user-mode segment descriptors to the GDT when the target OS boots up [2].

1.4 Expected Results

The primary goal of this project is to enable mode switching during process dispatch, system calls, hardware interrupts and exceptions. If implemented correctly, the expected results are that the privilege level changes occur smoothly with no unexpected exceptions, faults, errors or crashes. With the help of the GNU Debugger (GDB), the state of the registers can be examined before and after a process is dispatched. If a process successfully switches to ring three, the segment registers should point to the user- mode code, data and stack segments. Additionally, the stack pointer needs to point to the user-mode stack after switching to ring three. When transitioning from ring three to ring zero during a hardware interrupt, software interrupt (e.g. ) or exception, the segment registers should point to the kernel-mode code, data and stack segments.

Furthermore, the stack pointer should point to the process’s kernel stack.

When a page-fault or general-protection fault occurs, it is expected that the proper exception handler routine will be invoked and make its best effort to recover from the exception. In some cases, recovery will not be possible, so the kernel will panic and display a message on the console that shows diagnostic information, such as the type of

5

exception, the current run PID, error code, etc. In other cases, the process that caused the exception will be terminated and the system will continue running after selecting another process to run from the run queue. The behavior of the exception handler will depend on the error code pushed on the stack when the page-fault or general-protection fault occurs.

1.5 Background

The following is a list of key terms and definitions relevant to this project:

Process: an executing program. A process consists of multiple sections including a

text section (executable code), a data section (global and static variables), a stack

section (used for procedure calls) and a heap section (used for dynamic memory

allocation).

x86: a CISC instruction set architecture based on the processor. The x86

family of processors includes the , 80386, 80486, and most of the

newer Intel processor models. Most Intel and AMD processors use the x86 instruction

set architecture [3].

Segmentation: a memory model that partitions memory into multiple regions

primarily for protection and isolation. Segments are typically defined by segment

descriptors that contain the base address, limit (size) and other attributes such as

privilege level. Segments can be of different sizes and can be defined for a process’s

code, data and/or stack sections.

6

Paging: a memory model where memory is divided into fixed sized regions called

pages (typically 4 KB or 8 KB in size). Nowadays, paging is used more commonly

than segmentation. Paging is typically implemented using special hardware registers

in the CPU and page tables in memory.

Multitasking OS: an OS that is capable of running multiple tasks (processes) by

storing them in different areas of memory. A multitasking OS only runs one task at a

time on the CPU but switches among all runnable tasks rapidly giving the illusion that

they are running simultaneously.

User-Mode: a lower privileged CPU mode that is designed for ,

libraries and OS utilities (e.g. shells, , etc.). A process executing in user- mode only has access to a subset of the instructions and memory.

Kernel-Mode: a higher privileged CPU mode that is designed for the OS kernel. In kernel-mode, all instructions and memory are accessible.

Real Mode: the x86 CPU mode that is enabled when the system is powered on. It allows access of up to 1 MB of memory and memory segments are limited to 64 KB.

This mode is mainly used for with older CPU models such as the Intel 8086 and 80186 models [4].

7

Protected Mode: an x86 CPU mode that enables access to 4 GB of memory on a 32-

bit system (64 GB can be accessed using the Page Address Extension mechanism)

[5]. Protected mode also supports paging, virtual memory and additional instructions

to facilitate multitasking and protection [4]. Protected mode was first introduced in

the Intel 80286 CPU [6].

Protection Ring: in x86 CPUs, different privilege levels are referred to as protection

rings. The Intel x86 architecture defines four levels: 0 (highest

privilege, used by the kernel), 1 and 2 (usually are not used but can be used for device

drivers or other low-level OS services) and 3 (lowest privilege, used by applications).

Global Descriptor Table: an x86 protected mode data structure used to store global

(system-wide) descriptors, such as segment descriptors, Task State Segment (TSS) descriptors, Local Descriptor Table (LDT) descriptors and descriptors. Each descriptor contains attributes that define the characteristics of the corresponding memory region, TSS, LDT, call gate, etc. For example, segment descriptors contain a

base address, limit (size), privilege level and other attributes [7].

Local Descriptor Table: an x86 protected mode data structure that contains per-

process descriptors for the code, data and stack segments. The LDT can also contain

call gate descriptors.

8

Interrupt Descriptor Table: an x86 protected mode data structure that contains interrupt gate descriptors, trap gate descriptors and/or task gate descriptors [8]. Each interrupt/exception service routine must contain an entry in the IDT to specify the address of the interrupt/exception handler, privilege level and other attributes.

Task State Segment: an x86 CPU data structure used to store the (registers) of a process. It also contains fields for the stack pointers and stack segments for privilege levels zero through two, a previous link field (used for linking nested tasks), an IO redirection map and an LDT segment selector.

Memory Management Unit (MMU): The hardware component of the CPU that is responsible for translating linear or virtual addresses to physical addresses. The MMU is typically part of the CPU and interfaces directly with the physical address .

Virtual Memory: a technique, which enables a process to access more memory than is physically available on a computer system by swapping memory pages to and from secondary storage. The page swapping is handled by the OS and is completely transparent to the process.

9

Chapter 2

OVERVIEW OF THE X86 CPU ARCHITECTURE

To understand how privilege level transitions occur during task switching, it is

necessary to have some general knowledge of the architectural components in the CPU

that facilitate task switching. In this chapter, a general overview of the x86 architecture is

presented, followed by sections that cover more specific x86 topics, such as protected

mode data structures, task switching and interrupt/. In CSC 159, the

target computer (which runs the OS that students create) contains an Intel Pentium II

CPU. The Intel Pentium II CPU is part of the 32-bit x86 family of processors, which

include Intel IA-32, and 32-bit AMD Processors.

2.1 Intel/AMD x86 Architecture Overview

The x86 architecture is a Complex Instruction Set Computer (CISC) architecture

based on the Intel 8086 and 8088 models. The x86 architecture is the most widely used

instruction set in non-embedded worldwide [3]. In 32-bit protected mode, an

x86 CPU consists of eight 32-bit general-purpose registers, six 16-bit segment registers, a

32-bit EFLAGS register and a 32-bit instruction pointer register. It also contains floating

point and Single Instruction Multiple Data (SIMD) registers. However, those registers will not be used in this project. Protected-mode is the native mode of an x86 CPU and it facilitates multitasking by enabling features and mechanisms to enforce protection of system resources such as memory and I/O ports. Real-mode is the mode the CPU is in when the system is first powered on or reset. Another mode, called system management

10

mode, is used for customizing certain features of the computer. Real-mode and will not be discussed further in this report because they are outside the scope of this project.

The 32-bit general-purpose registers are mainly used for performing arithmetic and logical operations, storing memory addresses (e.g. pointers), string processing, etc. The

ESP register is the stack pointer register which contains the address of the top of the stack. The stack is a data structure which is used during procedure call and return. The

EBP register is the base frame pointer register. This register is typically used in procedures to access arguments and local variables on the stack. Each argument and/or is located at some offset relative to the address stored in EBP.

The 16-bit segment registers are used to point to segment descriptors. They contain segment selectors which are used as indexes into either the Global Descriptor Table or

the Local Descriptor Table. The segment descriptor specifies the base address to use

when forming a linear . Segment descriptors also contain a two bit

Descriptor Privilege Level (DPL), the type of segment (code or data), the size of the

segment and other attributes. Most modern 32-bit OSes used a flat address space which

means that the segment descriptors in the GDT or LDT contain a base address of

0x00000000 and a size of 0xFFFFFFFF (4 GB). This typically simplifies the application

and OS programmer’s job because the virtual memory addresses are equivalent to linear memory addresses. The least significant two of the values stored in the segment registers indicates the Requested Privilege Level (RPL). The 16-bit CS register contains a

13-bit selector for the code segment descriptor in bits [15:3]. Bit 2 is used to indicate

11

whether the code segment descriptor is located in the GDT or LDT. The least significant

two bits [1:0] indicate the Current Privilege Level (CPL) of the process. The 16-bit DS

register is the register which is typically used for general memory accesses

and the 16-bit SS register is the stack segment register used for accessing the stack

memory. The stack segment is really just a special type of data segment. The other data

segment registers (ES, GS and FS) have a similar function as the DS segment register.

The data segment registers use the upper 13 bits for the segment selector, bit 2 is the table

indictor (GDT or LDT) and bits [1:0] are the RPL.

The 32-bit instruction pointer register (EIP) contains the address of the next

instruction that will be executed by the CPU. The value in EIP cannot be modified

directly. During program execution, its value is automatically incremented to the address

of the next instruction in memory unless a branch instruction (e.g. call or ) is reached.

A branch instruction will typically assign EIP a non-sequential memory address of a label or function. An exception or interrupt can also cause a branch to a non-sequential memory address. The 32-bit EFLAGS register contains status, control and system flags.

The status flags indicate the result of an arithmetic or logical operation. The system flags are used to control the state of the CPU (e.g. enable/disable interrupts, set trap flag for , set the Nested Task flag to indicate the current task is linked to another task, etc.). The one control flag is the direction flag which controls the direction of auto- incrementing the ESI and EDI index registers when processing strings. See Figure 1 for a diagram of the general-purpose and segment registers in an x86 CPU.

12

Figure 1. General-Purpose and Segment Registers in an x86 CPU

2.2 Protected Mode Structures

In protected-mode, the segment descriptors stored in the GDT and LDT are used to control access to memory regions with different privilege levels. Each segment descriptor contains a 2-bit Descriptor Privilege Level (DPL) which is typically set to either 00b

(protection-ring 0) for the kernel-mode code, data and stack segments or 11b (protection- ring 3) for the user-mode code, data and stack segments. When a user-mode process is dispatched by the kernel, the segment registers CS, DS, SS, ES, GS and FS should have both of the least significant bits [1:0] set to indicate that they are requesting access to user-mode segments with privilege level 3. Also, bits [15:3] should contain the proper segment selector indexes for the user-mode segment descriptors in the GDT or LDT. For

13

example, if the user-mode code segment descriptor is located at index 13 in the GDT, the

CS register should contain 0x006B = 0000000001101011b. Bits [15:3] are

0000000001101b (13 in decimal), bit 2 is 0b to indicate the segment descriptor is in the

GDT and bits [1:0] are 11b (3 in decimal) to indicate the RPL is 3. When the segment registers are loaded with segment selectors, numerous privilege check conditions must be satisfied for security reasons. For example, to access a data segment, the rule is that max (CPL, RPL) ≤ DPL [9]. The code and stack segments have similar rules but there are too many conditions to list here (see the Intel Software Developer’s Manuals [10] for more information). If the privilege check conditions are not met, the CPU generates a general-protection fault.

The Current Privilege Level (CPL) controls which instructions and memory are available to a process or task. As shown in Figure 2 below, applications run with the lowest privilege level (ring 3) and the kernel runs with the highest privilege level (ring 0).

Rings 1 and 2 are intermediate privilege levels that can be used for device drivers or other

OS services that need access to low-level resources such as I/O ports or kernel data structures. Many modern OSes, such as Windows and , don’t use rings 1 and 2

(only rings 0 and 3 are used) [11]. The reason for this is likely that it simplifies the OS design because using more privilege levels than is necessary can overcomplicate things.

Another important protected mode data structure is the Interrupt Descriptor Table

(IDT). This table contains interrupt and trap gates which are used to service hardware and software interrupts, exceptions, traps and faults. The IDT will be discussed in more detail in section 2.5 (Exceptions and Interrupts).

14

Figure 2. Protection Rings in an x86 CPU [11]

2.3 Task Switching

Task switching is necessary on a multitasking OS because only one task can run on

the CPU at a time. Since there are usually many runnable tasks on a system that are

competing for CPU cycles, there must be some mechanism to select which one gets to

run and for how long. The concept of time-slicing is often used to arbitrate the CPU

among the runnable tasks in the system. Each task runs until its time slice is reached.

Once a task reaches its time slice, it is placed back on the run queue and another task is

selected to run. This is where task switching must take place. During a task switch, it is imperative that the state of the task (context) is saved to memory. This allows the task to continue running from exactly the same state it was in when it reached its time slice. The data structure that is used to store the state of a task in memory is typically either a TSS or a trap frame. A trap frame is a data structure that stores the segment registers, general- purpose registers, EFLAGS, EIP, event type, etc. Trap frames will be covered in more

15

detail in Chapter 5 and Appendix A. A diagram of a 32-bit x86 TSS is shown in Figure 3 below.

Figure 3. 32-bit x86 Task State Segment [12]

16

It is not necessary to know the details of every field in the TSS, however, some of the important fields are SS0 and ESP0 which are the ring 0 stack segment and stack pointer, respectively. These fields are used to switch stacks when the task switches from ring 1, 2 or 3 back to ring 0. The previous task link field is used to link TSS’s together to form nested tasks. The CR3 field contains the base of the paging directory structure. The I/O map base address field is used for I/O permissions (this field is not relevant for this project). The SS1, ESP1 (ring 1 stack segment and pointer) and SS2, ESP2 (ring 2 stack segment and pointer) fields are also not relevant to this project since only rings 0 and 3 will be used. The remaining fields are self-explanatory.

There are multiple ways that a task switch can be achieved on an x86 system. One way is to create a Task State Segment (TSS) for each task (including the kernel).

Switching tasks in this way is called a hardware task switch because hardware circuits in the CPU control the task switch. To implement a hardware task switch from the kernel to a user-mode task, the following steps must be completed [2]:

1) A TSS must be created for the kernel.

2) The fields in the kernel TSS must be assigned the proper initial values. This

includes the segment registers, SS0 and ESP0, LDT link field, etc.

3) A TSS descriptor for the kernel TSS must be created and added to the GDT.

4) The Task Register (TR) in the CPU must be assigned the segment selector of the

kernel TSS descriptor. The ltr (load task register) instruction can be used to

accomplish this.

17

5) When a user-mode task is created, a TSS must be created and initialized for it.

The EIP field in the user-mode TSS should contain the entry point of the user-

mode process. The segment registers must contain the user-mode segment

selectors; ESP should contain the address of the top of the user stack; SS0 and

ESP0 should reference the process’s kernel stack.

6) A TSS descriptor for the user-mode process must be created and added to the

GDT.

7) The link field in the kernel TSS must be assigned the selector for the user-mode

TSS descriptor in the GDT.

8) The NT flag in the EFLAGS register must be set while in kernel-mode.

9) Finally, to dispatch the user-mode process, the iret instruction should be executed

by the kernel.

There are numerous other ways that hardware task switching can be accomplished on x86 systems. For example, a far call or jump to a TSS descriptor, a far call or jump to a task gate descriptor, a software or hardware interrupt that selects a task gate in the IDT, etc. [13]. These other hardware task switching mechanisms are mentioned here for completeness but will not be discussed in detail because they are outside the scope of this project.

According to [14], most modern Operating Systems implement context switching in

software. To implement software context switching, only one TSS needs to be created for

each CPU. Each process shares the TSS and the only TSS fields that are used are ESP0

and SS0 to specify the process’s kernel stack. There are many reasons why software

18

context switching is more common in modern OSes. One reason is that software context

switching is faster than hardware context switching. In addition, software context

switching is more portable (64-bit x86 systems do not support hardware context switching in 64-bit mode) and some features are only possible to implement with software context switching (e.g. changing process/ priority). In most cases, only

some of the CPU registers need to be saved and restored during a .

Hardware context switching is slower because it always saves and restores the entire state

of the register file in memory, which is usually not necessary. Software context switching

only saves and restores the registers that change from the prior task to the new task [15].

2.4 Virtual Memory and Address Translation

Memory management is a critical component of any computer system. Typically, the Operating System is responsible for managing a computer system’s memory. Each process needs to have some memory allocated to it to store its executable image, initialized and uninitialized global variables, stack and heap. When a process finishes executing or is terminated, the memory allocated to it must be freed so another process can use it. Furthermore, there must be some mechanisms to protect memory from unauthorized access. The x86 family of CPUs support virtual memory through the paging mechanism. Paging is supported by hardware registers in the CPU’s MMU and page tables, which are set up by the OS. Page tables facilitate memory protection through access rights within the structures.

19

The 32-bit x86 paging mechanism consists of hierarchical paging structures. As

stated in [16], when paging is enabled, three (CR3) contains the base

address of the page directory. The page directory can contain up to 1024 entries. Each

page directory entry is 4 in size. A used page directory entry will contain the upper

20 bits of the address of a page table in bits [31:12]. Bits [11:0] contain access rights and

other attributes such as page present, read/write, page size, etc. For this project, the

important attributes are the page present bit (bit position 0), the read/write bit (bit

position 1) and the user/supervisor bit (bit position 2). The page table entries are also 4

bytes in size and have a similar format as the page directory entries. The upper 20 bits of

a used page table entry points to a 4 KB page in memory. The upper and middle 10 bits

of a 32-bit linear address are used as indexes into the page directory and page table structures, respectively. The lower 12-bits are used to select an individual within a 4

KB physical page. Figure 4 contains a diagram on the x86 paging structures.

20

Figure 4. Paging Structures in an x86 Computer

Page tables allow the Operating System to protect memory from unauthorized reads or writes. From the process’s perspective, it thinks that it has access to all the memory in the system (e.g. a virtual address space from 0 to 4 GB on a 32-bit system). However, the process typically only has a small fraction of the 4 GB physical address space allocated to it. If a process tries to access memory that is part of its address space but the page is not in memory (page present bit is cleared), a exception will be raised and the page is read into memory from secondary storage by the page-fault handler. The instruction that caused the page fault is then restarted. Another scenario that can cause a page fault is when a process attempts to read or write to memory that is outside of its address space.

When this occurs, a page fault will be generated, but in this case, the reason for the page

21

fault is due to an invalid memory access. The page-fault handler can detect the reason for

the page fault by examining the error code that is pushed onto the stack (this happens

automatically when a page fault is triggered) [16]. In most cases this will result in a and abnormal termination of the process. In some cases, an invalid memory access will cause the OS to panic and shutdown (e.g. Windows Blue Screen of

Death). A third scenario is when a process tries to write to a page marked as read-only.

This type of page fault is another type of invalid memory access and the result is typically a segmentation fault.

2.5 Exceptions and Interrupts

Exception and interrupt handling are one of the fundamental responsibilities of an

Operating System. Exceptions such as faults, traps and aborts must be detected and handled by the appropriate exception handler routines to maintain system stability and uptime. Some examples of exceptions include divide by zero, , page fault, invalid opcode, , etc. Often times an exceptional condition will occur because of a programming error (e.g. segmentation fault caused by an array index going out of bounds) or a hardware issue (e.g. bus transaction error). It is the OS’s responsibility to try to recover from exceptions when they occur. If it is unable to recover, the OS will shut down or .

Software and hardware interrupts are mechanisms that allows a process or hardware device to request services from the kernel. Interrupts are similar to exceptions because they both interrupt the CPU and cause a branch to a service routine. The difference

22

between exceptions and interrupts is exceptions are typically a result of erroneous or

invalid conditions whereas interrupts are a request for service(s). In some cases, a

hardware interrupt is the result of a hardware malfunction of an I/O device, but more

commonly, it is the result of an asynchronous event (e.g. a network packet is received on

a NIC port, an I/O request is completed, etc.). Interrupts allow the OS to achieve higher

responsiveness, which typically improves the end user’s experience.

The Interrupt Descriptor Table (IDT) is a data structure used to store interrupt

descriptors for interrupt/exception service routines. The IDT is typically set up by the OS

during bootstrapping. In 32-bit protected mode, the IDT contains 8 byte descriptors for

interrupt or trap gates [8]. Interrupt gate descriptors are primarily used to specify the code

segment and offset (address) of the Interrupt Service Routine (ISR). They also contain

attributes such as privilege level, descriptor type, present bit, etc. Each interrupt is

identified by an integer from 0 through 255. On x86 CPUs, interrupts 0 through 31 are

reserved for CPU exceptions, however, only interrupts 0 through 20 are used [8].

Interrupts 32 through 255 are available for other purposes, such as a system call handler

(software interrupt) or a timer (hardware interrupt). A process or task can issue the int n instruction (where n is an integer) to request services from the kernel.

The corresponding interrupt service routine will be invoked in the kernel to service the process’s request.

According to [17], when an exception or interrupt occurs without a privilege change, the CPU pushes the EFLAGS, CS and EIP register values onto the stack (in that order).

For some types of exceptions, it also pushes an error code on the stack after pushing EIP.

23

The register values are pushed on the stack so that the task that was running can resume after the exception or interrupt is handled by the CPU. Restoring EFLAGS, CS and EIP is typically accomplished using the iret instruction which pops the values back into the respective registers. When an exception or interrupt occurs with a privilege change, the

CPU pushes the SS and ESP registers prior to pushing the EFLAGS, CS, EIP (and possibly an error code) onto the stack. See Figure 5 for a diagram of the stack after an exception or interrupt occurs.

24

Figure 5. Stack Contents after an Exception or Interrupt [17]

25

Chapter 3

OVERVIEW OF THE SPEDE X86 DEVELOPMENT ENVIRONMENT

Before discussing the methodology and implementation of enabling user-mode processes in the target OS, it is essential to understand the SPEDE development environment. This chapter covers the tools and utilities provided by SPEDE to facilitate the development of an OS. The first section contains an overview of SPEDE and the following sections cover some of the key SPEDE utilities, such as Flash, Flint and GDB.

The final section of this chapter describes the virtual SPEDE environment which allows

SPEDE and the target OS to run in virtual machines.

3.1 Overview of SPEDE

The Systems Programmer’s Educational Development Environment (SPEDE) consists of a , assembler, , makefile generation tool (spede-mkmf), Flames monitor, FLASH (Flames shell), FLINT (Flames interface) and a debugger (GDB). CSC

159 students develop an OS on a Linux host using the SPEDE tools. Compilation and assembly of the OS image is made simpler with the spede-mkmf tool. This tool creates a

Makefile from all the ., .cpp, .h and .S source/header files in the current working directory. The Makefile can then be used to compile and link the object files into an executable image (e.g. MyOS.dli). By default, the OS executable image is created as an

Executable and Linkable Format (ELF) file. After creating an OS executable, it can be downloaded to an x86 target computer where it is executed. The compiler and linker are set up to add symbolic debugging information to the executable file, which allows it to

26

run under the control of GDB. GDB allows the OS developer to set one or more

breakpoints in the OS code so that execution will pause when it reaches the breakpoint(s)

and the state of the registers, stack, variables and other data structures can be examined.

The host and target computers are connected using a RS-232 serial link [18]. The Flash

utility is used to download the OS executable image to the target (see Figure 6 below).

After downloading the executable image, GDB is invoked from Flash, which allows the

OS developer to single-step through the OS code as it runs or run the program normally.

Host Target

FLASH

RS-232 Serial Link

Figure 6. FLASH Downloads the OS Image from the Host to the Target

On the target, the Flames monitor is responsible for managing the serial connection with the host and bootstrapping the downloaded OS executable image. The target systems are configured to boot the MS-DOS Operating System, which invokes the Flames

27

monitor batch file (FLAMES.BAT) after the Power-On-Self-Test (POST) completes. The

Flames monitor puts the target into 32-bit protected mode, sets up the GDT, IDT and paging structures (e.g. CR3 and boot page tables) and turns on paging [18]. The boot page tables are set up so that the virtual addresses and physical address are equal

(identity-mapped paging) [18].

Table 1. Segment Descriptors Added to the GDT in the Target OS. [2]

Segment Offset from GDTR

Kernel Code 0x08

Kernel Data 0x10

Kernel Stack 0x18

User Code 0x30

User Data 0x38

User Stack 0x40

During bootstrapping of the target OS, segment descriptors for the kernel-mode and

user-mode segments are set up and added to the GDT by the Flames monitor. As can be

seen from Table 1 above, the kernel code segment descriptor is located at entry 1 (offset

0x8) in the GDT, the kernel data segment descriptor is located at entry 2 (offset 0x10)

and the kernel stack segment descriptor is located at entry 3 (offset 0x18). Similarly, the

user code, data and stack segment descriptors are located at entries 6, 7 and 8,

28

respectively. There is a GDT register in x86 CPUs (named GDTR) that stores the base address and size of the GDT. The offsets listed in the table are relative to the base address of the GDT (each segment descriptor is 8 bytes in size). By default, the target OS in CSC

159 does not make use of Local Descriptor Tables (LDTs). LDTs are descriptor tables that can be used to create code and data segments that are local to a specific task.

Creating LDTs is optional but the system must have one GDT. Once the bootstrapping phase is complete and the OS developer instructs Flash to run the OS program, the

Flames monitor passes control to the entry point of the OS which is the main() function.

At this point, kernel data structures should be initialized (e.g. current pid, run queue,

Process Control Blocks, etc.), interrupt gates added to the IDT, display terminals initialized, etc.

SPEDE provides a rich set of libraries to make OS development easier. Table 2 below lists the important library files and a summary of the relevant functions/macros they provide:

Table 2. SPEDE Library Files [18]

Library File Summary cons_printf(), cons_kbhit(), breakpoint(), etc. __BEGIN_DECLS, __END_DECLS, __ELF__, etc. assert() inportb(), outportb(), etc. get_idt_base(), get_cs(), get_cr3(), set_gdt(), etc. struct i386_gate, fill_gate(), fill_descriptor(), etc. CNAME(), ENTRY(), etc. struct i386_tss IER_ERXRDY, DATA, BAUDLO, BAUDHI, etc.

29

In the CSC 159 lab at CSUS, the host system technical specifications are:

. Dell Precision Tower 3620

. Intel® Core™ i7-6700 CPU @ 3.40 GHz (Skylake)

. 15.6 GB of RAM

. 814.2 GB

. Linux 2.6.32 (x86_64)

. CentOS Release 6.9 (Final) with GNOME ver. 2.82.2

Target system technical specifications are:

. Hewlett Packard Vectra PC

. Intel® Pentium® II-MMX @ 400 MHz

. 64 MB of RAM

. 4.3 GB IDE 3.5” Hard Drive (Maxtor 9043202)

. Diskette Drive A: 1.44 MB 3.5 inch

. Display Type: VGA/EGA

. Serial Ports: 3F8, 2F8, 3E8 and 2E8

. MS-DOS ( )

Dummy Terminal technical specifications:

. TeleVideo Model 910

30

Figure 7. SPEDE Environment in the CSC 159 Lab

3.2 FLASH - Flames Shell

Flash (Flames Shell) is a SPEDE utility that provides the OS developer with an interface to interact with the target system. Flash provides commands to download the OS executable image to the target system, invoke Flint (Flames Interface), capture a trace buffer on the target or invoke GDB. Once the OS executable is created using a Makefile, the OS developer can invoke Flash using the following command:

# flash MyOS.dli

Next, the OS image can be downloaded to the target by entering ‘d’ (download) at the

Flash prompt. The download command instructs Flash to set up a serial connection with the target system and download the OS executable file in 128-byte blocks. The code is loaded at physical memory address 0x101000 by default. Once the OS image has

31

downloaded to the target system, the ‘g’ (go) command will launch GDB. GDB is set up to break on the first instruction in the main() function. At this point, the OS developer can either type ‘s’ at the GDB prompt to single-step through the code, set a breakpoint at one or more lines of code (e.g. ‘b ’) and/or type ‘c’ to continue running the program. Once the program is running, commands can be entered from the keyboard on the target system. For example, students typically program their OS so that typing ‘b’ on the target keyboard will break into GDB on the host system.

3.3 Flint - Flames Interface

Flint (Flames Interface) is a utility that provides an interface to the target system hardware. This utility can be invoked directly from the Flash shell by entering “flint” at the Flash command prompt. Flint interfaces directly with the Flames monitor on the target over the RS-232 serial link. Flint provides commands to examine the target PCI

Bus, print GDT entries, print IDT entries, inquire about the CPU state, examine registers, manipulate memory, inquire about devices, etc. [18]. Flint is a powerful tool for debugging because it can provide more information on the state of the target system than

GDB can. For example, Flint can be used to examine the state of the CPU control registers, GDTR (Global Descriptor Table Register), IDTR (Interrupt Descriptor Table

Register), etc. which is not possible with GDB. If an exception occurs, Flint can be used to decode the error code and provide additional debugging information such as a memory dump [18].

32

3.4 GNU Debugger

The GNU Debugger (GDB) is configured to debug the OS running on the remote target system from the Linux host. In order for GDB to debug the OS program running on

the target, it must be compiled and linked with the ‘-g’ option to add debugging symbols

to the OS executable image [18]. The Makefile created by spede-mkmf will automatically

add this option to the compilation and linking commands. Additionally, a GDB remote

stub must be linked with the OS executable file. This stub is used by GDB running on the

host to inquire about the state of the OS running on the target [18]. There is a script file named “GDB159.RC” that is used to set up the parameters for communicating with the target system (e.g. baud rate, serial port , etc.). This file is stored in a location on the host system that is known by Flash. When GDB is invoked from Flash, it will check the current working directory for the “GDB159.RC” file. If it is not found, it will copy it to the current working directory before invoking GDB.

GDB is an invaluable debugging tool for debugging issues in the target OS. The state of the CPU registers can be examined (‘i r’ command); breakpoints can be set to pause program execution (‘b’ command); the function can be examined (‘bt’ command); memory can be examined (‘x’ command); functions can be called, etc. GDB is the most commonly used debugger used in CSC 159 in recent years and it is used extensively in this project. After changing privilege rings, the state of the CPU registers will be examined using GDB to determine whether the ring change was successful or not.

33

Figure 8. GDB Invoked from Flash on the Host System

3.5 Virtual SPEDE

Virtual SPEDE allows SPEDE and the target OS to run in virtual machines (VMs).

In CSC 159, students often use Virtual SPEDE to complete their programming assignments in the convenience of their own home. Oracle VM VirtualBox provides the software to run the SPEDE OS and target OS in VMs. VMs can also be created for terminal displays and/or a printer device. The Virtual SPEDE environment provides an environment that is almost identical to the CSC 159 lab environment. The only major difference is that in Virtual SPEDE the host, target, terminals, etc., are

34

running on the same physical system. Communication between the host and target still

uses serial ports but in Virtual SPEDE, the serial ports are configured in “Host Pipe”

mode. Host pipes allow VMs to communicate over “virtual” serial ports using the RS-232

protocol. Figure 9 below shows a screenshot of the virtual SPEDE environment.

Figure 9. Virtual SPEDE Environment

Virtual SPEDE Host System Configuration:

. Ubuntu 13.04 (Codename: Raring Ringtail)

. Linux 3.8.0-35-generic (32-bit)

. 1 Virtual CPU

. 1 GB RAM

35

. PIIX3 (PCI IDE ISA Xcelerator)

. USB Tablet (Pointing Device)

. 128 MB Video Memory

. SATA Storage Controller

. 10 GB LinuxPC-disk1.vmdk (Virtual Machine Disk)

. COM1 Serial Port (0x3F8)

. COM1 Serial Port/File Path: \\. \pipe\LinuxPCtoTargetPC

Virtual SPEDE Target System Configuration:

. MS-DOS (Microsoft Disk Operating System)

. 1 Virtual CPU

. 32 MB RAM

. PIIX3 Chipset (PCI IDE ISA Xcelerator)

. PS/2 Mouse (Pointing Device)

. 16 MB Video Memory

. IDE Storage Controller

. 102 MB TargetPC-disk1.vmdk (Virtual Machine Disk)

. COM1 Serial Port (0x3F8)

. COM2 Serial Port (0x2F8)

. COM1 Serial Port/File Path: \\. \pipe\LinuxPCtoTargetPC

. COM2 Serial Port/FilePath: \\. \pipe\TargetPCtoTerminal1

36

Chapter 4

METHODOLOGY

This chapter describes the methods used to achieve the project objectives. The methods for creating and dispatching user-mode and kernel-mode processes are covered in the first two sections, respectively. The third section covers methods to provide memory protection and the last section discusses interrupt and exception handling methods.

4.1 Creating User-Mode and Kernel-Mode Processes

In an Operating System, creating a process consists of allocating resources to manage the process (e.g. Process ID, Process Control Block, trap frame, etc.), setting up its execution environment (e.g. allocate memory, set up page tables, etc.) and adding the process to the run queue. This can be achieved using a handler function (e.g.

NewProcHandler()) which can be invoked from main() as part of the kernel initialization process. The NewProcHandler() routine could also be invoked from the kernel when a key is pressed on the target system (e.g. ‘n’ key). Another way that processes can be created is by providing Fork() and Exec() system calls. All -based OSes provide this

Application Programming Interface (API) for process creation. Processes can make a copy of themselves using Fork() and then overlay their executable image with another program using Exec(). CSC 159 students are exposed to both techniques in their programming assignments. In this project, the method I chose is the first approach (using

37

a NewProcHandler() function). The reason for choosing this method is because once

there exists a mechanism to create user-mode processes, the pre-existing Fork() system

call should be able to create a new user-mode process from an existing one. It may

require some minor changes to the pre-existing Fork() code, but it’s not likely to require

any major changes.

In a typical modern OS, most processes run in user-mode. However, there is often a need to run some processes only in kernel-mode. One example of this is the system idle process which runs when the run queue is empty. The system idle process typically runs in kernel-mode, does not ever request any resources, and never blocks [18][19]. It is necessary to have a system idle process because if there are no processes available to run, the CPU cannot just halt until a process becomes available to run. Some code must be executing on the CPU at all times unless the computer enters a power-saving mode (e.g. sleep mode). Another example of kernel-mode processes are low-level services such as kernel-mode device drivers. Device drivers are software modules that interface directly with hardware devices and provide an interface for applications to communicate with the device. To allow for the creation of both user-mode and kernel-mode processes, two separate handler functions were created: one for creating user-mode processes:

NewUserProcHandler() and one for creating kernel-mode processes:

NewKernelProcHandler(). The NewKernelProcHandler() function is very similar to the

NewProcHandler() routine that already exists in the CSC 159 kernel. However, the

NewUserProcHandler() routine contains some significant changes. The implementation

38

of the NewUserProcHandler() and NewKernelProcHandler() functions will be further discussed in Chapter 5 and the source code is listed in Appendix A.

4.2 Dispatching User-Mode and Kernel-Mode Processes

After a user-mode process is created, it is added to the run queue, which stores the

Process IDs of processes waiting to run on the CPU. The process scheduler is responsible

for selecting a process to run from the run queue. After a process is selected to run, it has

to be dispatched by the kernel. Dispatching a process involves loading the CPU registers

with the process’s context and passing control to it. As was mentioned in Chapter 2, this

can be achieved in multiple ways. One way is to create a TSS for the new process and

link it with the kernel’s TSS. Then the iret instruction is executed to switch from the

kernel task to the process task (hardware context switching). Another way is to create

only one TSS for the kernel and use some other data structure to store the process’s

register context (e.g. a trap frame). The register context stored in memory can be loaded

into the CPU using x86 assembly instructions such as popl, popa and mov. Then the iret

instruction is executed to perform the task switch. The later approach is referred to as

software context switching. For this project, I decided to use software context switching

because it is much more common in modern OSes. Furthermore, implementing hardware

context switching would have required more drastic changes to the existing CSC 159

target OS data structures. Using hardware context switching would have removed the

need for the trap frame and each process would need to have a TSS stored in kernel

memory.

39

4.3 Memory Protection for User-Mode Processes

Page tables provide a memory protection mechanism so that processes do not read

or write data they are not authorized to access (e.g. part of another process’s address

space or kernel memory). When a process is created in the target OS, it needs to be

allocated some memory for its text section, data section and stack. When paging is

enabled, every memory address is interpreted as a virtual memory address by the MMU.

The segmentation unit in the MMU converts each virtual memory address to a linear

address which then gets translated a second time to a physical address by the paging unit.

Since the segment descriptors in the target OS are set up to use a flat address space, the

virtual memory address is the same as the linear memory address. However, virtual

memory addresses are typically not the same as physical addresses (unless identity-

mapped page tables are used). The target OS kernel uses identity-mapped paging, so

kernel-mode processes will not need to have page tables set up (they can use the kernel’s

identity-mapped page tables). User-mode processes will need to have page tables set up.

This is accomplished as a part of the NewUserProcHandler() routine.

4.4 Handling Exceptions and Faults Generated by User-Mode Processes

On x86 systems, there are about twenty different types of exceptions that can occur

[22]. In this project, the two types of exceptions that will be handled by the kernel are page-fault exceptions and general-protection fault exceptions. A page-fault handler is necessary to allocate pages to a process that needs more memory and also to keep the

40

system up and running when a process attempts to access memory outside of its address

space. In this project, the main reason for a implementing a general-protection fault handler is to prevent the system from crashing when a process attempts to execute a privileged instruction. It is also beneficial to have both types of exception handlers because both page-faults and general-protection faults can occur during task switches if the segment registers do not contain the proper values and/or the page tables are not set

up properly. Other types of exceptions, such as divide by zero, overflow, invalid opcode,

etc., are possible in the target OS but writing handlers for them is outside the scope of

this project. Chapter 5 and Appendix A contain more details on the implementation of the

page-fault handler and general-protection fault handler routines.

41

Chapter 5

IMPLEMENTATION

This chapter describes the implementation of enabling user-mode processes in the target OS. It gives an overview of the critical data structures necessary to support user- mode processes and a high-level description of the algorithms used to implement user- mode process creation and dispatch. Additionally, the page-fault handler and general- protection fault handler algorithms are analyzed and discussed at a high level. Note that only some portions of the source code can be disseminated in this report to maintain the integrity of the CSUS CSC 159 programming assignment phases.

5.1 Organization of the Target OS Source Code

At the end of the semester, the CSC 159 target OS source code consists of about 14 source files and around 2500 to 3000 lines of code. The additional source code to enable user-mode processes in the target OS and implement the page-fault and general- protection fault exception handlers consists of about 180 additional lines of code. Table 3 contains a complete listing of all the source files along with a high-level summary of their content:

42

Table 3. Target OS Source Files

Source File Name Summary

Contains the main() and Kernel() functions. The main() function is the entry point of the target OS and contains code to initialize main.c kernel data structures. The Kernel() function is invoked every time an interrupt or exception occurs. It consists of code to call the appropriate handler routine based on the event type.

Definitions of exception and interrupt handling functions (e.g. handlers.c GetPidHandler(), TimerHandler(), NewUserProcHandler(), etc.)

syscalls.c System call definitions (e.g. GetPid(), Write(), Sleep(), etc.)

Utilities to manage the OS data structures (e.g. EnQ(), DeQ(), tools.c MyMemcpy(), etc.)

All kernel-mode and user-mode processes are defined here (e.g. proc.c SystemProc() and UserProc(), etc.)

Assembly routines to dispatch user-mode and kernel-mode events.S processes, handle interrupts and exceptions, etc.

Kernel data structures and constants are defined in this header types.h file (e.g. PCB, ready queue, run queue, trap frame, page structure, PROC_NUM, PAGE_SIZE, etc.) Declarations of kernel data structures (e.g. run_pid, ready queue, data.h run queue, array of PCB’s, array of kernel process stacks, etc.) handlers.h Declarations of handler functions defined in handlers.c

syscalls.h Declarations of system call functions defined in syscalls.c

tools.h Declarations of utility functions defined in tools.c

proc.h Declarations of process functions defined in proc.c

Declarations of process dispatch and event handling functions events.h defined in events.S

Includes SPEDE library files (e.g. , spede.h , , etc.)

43

5.2 Kernel Data Structures Overview

Table 4 contains a summary of some of the critical kernel data structures in the target OS. This will provide more context when viewing the source code in this chapter and Appendix A.

Table 4. Target OS Kernel Data Structures

Data Structure Summary

run_q Contains the PIDs of processes waiting to run on the CPU

ready_q Contains available PIDs run_pid PID of the currently running process proc_frame_p Pointer to process’s trap frame

pcb_t pcb[] Array of process control blocks

page_t page[] Array of page descriptors

proc_kernel_stack[][] Process kernel stacks

kernel_tss Kernel Task State Segment

GDT_p Pointer to the Global Descriptor Table

IDT_p Pointer to the Interrupt Descriptor Table

kernel_cr3 Contains the kernel’s CR3 register value

44

5.3 Kernel Data Structures to Enable User-Mode Processes

To enable CPU mode switching in the target OS, some of the existing kernel data structures had to be modified and some additional data structures had to be added to the kernel. A listing of the additional and modified data structures is shown below:

Additional Kernel Data Structures:

. Kernel Task State Segment (main.c)

struct i386_tss kernel_tss;

. Kernel Task State Segment Descriptor (main.c)

fill_descriptor(&GDT_p[9], (unsigned)&kernel_tss, kernel_tss_limit, TSS_PRESENT | ACC_TSS, 0x0);

. General-Protection Fault Handler Interrupt Gate Descriptor (main.c)

fill_gate(&IDT_p[GEN_PROT_FAULT_EVENT], (int)GenProtFaultEvent, get_cs(), ACC_INTR_GATE, 0);

. Page-Fault Handler Interrupt Gate Descriptor (main.c)

fill_gate(&IDT_p[PAGE_FAULT_EVENT], (int)PageFaultEvent, get_cs(), ACC_INTR_GATE, 0);

Modified Kernel Data Structures:

. Three new fields were added to the trap frame (new fields are in bold text):

typedef struct { unsigned short gs; unsigned short _notgs; unsigned short fs; unsigned short _notfs; unsigned short es; unsigned short _notes; unsigned short ds;

45

unsigned short _notds; unsigned int EDI; unsigned int ESI; unsigned int EBP; unsigned int ESP; unsigned int EBX; unsigned int EDX; unsigned int ECX; unsigned int EAX; unsigned int event_type; unsigned int error_code; /* exception error code */ unsigned int EIP; unsigned int CS; unsigned int EFL; unsigned int user_esp; /* user stack pointer */ unsigned int user_ss; /* user stack segment */ } proc_frame_t;

o When the iret instruction is executed to switch from kernel-mode to user- mode, the CPU expects the user-mode stack pointer and stack segment

values to be on the run-time stack above EFLAGS. When the CPU mode

switches from user-mode to kernel-mode (e.g. system call), the user-mode

stack segment and stack pointer values are pushed onto the stack before

pushing EFLAGS, CS and EIP.

o The error code is pushed onto the stack below EIP for some types of exceptions (e.g. page-faults, general-protection faults, etc.).

. One additional field was added to the PCB structure to store the process type

(shown in bold text below):

typedef enum {KMODE_PROC, UMODE_PROC} proc_type_t;

46

typedef struct { proc_type_t proc_type; /* stores the process type */ state_t state; . . . proc_frame_t *proc_frame_p; } pcb_t;

. Both DPL bits must be set in the system call handler interrupt gate descriptor.

This is achieved by bitwise or of the ‘ACC_PL_U’ constant with the

‘ACC_INTR_GATE’ constant:

fill_gate(&IDT_p[SYSCALL_EVENT], (int)SyscallEvent, get_cs(), ACC_INTR_GATE | ACC_PL_U, 0);

5.4 Kernel Task State Segment Implementation

To create a TSS for the kernel, I used a struct i386_tss (defined in the spede library file ). The kernel TSS structure is defined as a global variable in the main.c file. During kernel initialization, the segment fields in the TSS are populated with the kernel-mode segments and a TSS descriptor for the kernel TSS is added to the

GDT at entry number 9. A small assembly routine named LoadKernelTss() loads the kernel TSS into the Task Register (TR) after the TSS descriptor has been added to the

GDT. The tss_esp0 field in the kernel TSS is assigned to the top of the process’s kernel stack just prior to process dispatch. The tss_ss0 and tss_esp0 fields specify the location of the process’s ring 0 stack (kernel stack). This is necessary because when a process switches from user-mode to kernel-mode the stack will switch to the kernel-mode stack using the tss_ss0 and tss_esp0 values in the kernel TSS.

47

5.5 API for Creating and Dispatching User-Mode Processes

To create a user-mode process, the NewUserProcHandler() function can be invoked.

This function takes one parameter: a function pointer that points to the function that implements the user-mode process. The NewUserProcHandler() routine can be called from within main() or the Kernel() function. This function will set up the new user-mode process’s PCB, trap frame, page tables, etc. When a user-mode process is created in the target OS, five pages are allocated for it to use. The reason that five pages are used is because it is necessary to have pages for the following:

1) Page Directory Structure

2) Instruction Page Table

3) Instruction Page

4) Stack Page Table

5) Stack Page

The user-mode processes created in CSC 159 may or may not have a data section for initialized and uninitialized global variables. If they have a data section, it can be stored along with the text section in the Instruction Page. It is not likely that the size of a user- mode process’s text and data sections combined will be greater than the size of one page

(4 KB). A highly sophisticated OS (e.g., Windows, Linux, etc.) would typically allocate separate pages for the text and data sections. However, for CSC 159, the processes are much simpler and do not need this separation.

48

Once five pages are allocated for a newly created user-mode process, page tables need to be set up with the proper permissions. The high-level steps necessary to set up page tables include:

1) Copying the executable (text section) into the instruction page

2) Copy the first four entries in the kernel page table into the page directory page

3) Assign the address of the instruction page table to the appropriate entry in the

page directory

4) Assign the address of the stack page table to the appropriate entry in the page

directory

5) Assign the address of the instruction page to the appropriate entry in the

instruction page table

6) Assign the address of the stack page to the appropriate entry in the stack page

table

7) Use the set_cr3() function to assign CR3 with the base address of the page

directory

The entries selected in steps 3 through 6 depend on the virtual memory addresses chosen by the OS developer. For example, one could use a virtual address of 1 GB (0x40000000) for the base of the text section (instructions) and a virtual address of 2 GB – sizeof(trap_frame_t) = 0x7FFFFFB4 for the top of the stack (sizeof(trap_frame_t) = 76 bytes). Seventy-six is subtracted from 2 GB so that the process’s trap frame can be stored in the upper 76 bytes of the page. Since the stack grows down on x86 systems, the virtual memory region from 0x7FFFF000 to 0x7FFFFFFF would be allocated to the 4 KB stack

49

page. When setting up the page directory and page table entries, the present, read/write

and user/supervisor bits (bits [2:0]) must be set to enable user-mode read/write access.

The routine to dispatch a process in the target OS was implemented in x86 assembly

by the CSC 159 OS developers. The name of the routine is ProcLoader() and it gets

passed a pointer to the process’s trap frame. This routine is responsible for loading the

data segment registers from the trap frame using the popl instruction and loading all the

general-purpose registers using the popa instruction. It also loads the EIP, CS and

EFLAGS registers (and ESP and SS if there is a privilege change) using the iret

instruction. After executing the iret instruction, control transfers from the kernel to the

process.

5.6 API for Creating and Dispatching Kernel-Mode Processes

To create a kernel-mode process, the NewKernelProcHandler() routine can be invoked from the main() or Kernel() functions. This function is similar to the

NewUserProcHandler() routine, however, it does not require as much code because kernel-mode processes do not need to have page tables setup. Another key difference is

the kernel-mode process’s trap frame does not get copied into a stack page. Kernel-mode

processes only use the ‘proc_kernel_stack’ data structure for their stacks. Since they

always run in kernel-mode, there is no need to have an additional memory page allocated

for their stacks. Additionally, the text section is not copied into an instruction page like it

is in the NewUserProcHandler() function. The kernel-mode text section remains in the

50

same physical memory region set up by the compiler and linker. Kernel-mode processes use the same ProcLoader() dispatch routine as user-mode processes.

5.7 Handling Software Interrupts Generated by User-Mode Processes

When a user-mode process wants to request a service from the kernel, it executes a software interrupt by executing an int n instruction (e.g. system call). In the CSC 159 OS

(and most Unix-based OSes) the system call handler routine is stored at interrupt vector number 128. The system call type is stored in the EAX register and the kernel uses this value to determine which system call handler routine to invoke. Some examples of system calls include: GetPid(), Read(), Write(), Sleep(), Fork(), etc. Each system call has a corresponding handler routine (e.g. GetPidHandler(), ReadHandler(), WriteHandler(), etc.) To grant access to the system call interrupt gate from ring 3, the DPL bits need to be set to 11b (3 in decimal) in the system call interrupt gate descriptor.

5.8 Page-Fault Handler Implementation

Page fault exceptions are generated when a process accesses memory that is either temporarily unavailable or inaccessible for security reasons. When a page fault occurs, an error code is automatically pushed onto the stack. This error code indicates if the page fault was caused by a protection violation or a non-present page (bit 0), a page read or write (bit 1) and which mode (user or kernel mode) the CPU was in when the page fault occurred (bit 2) [20]. Bit 3 of the page-fault error code indicates that the page fault was

51

caused by setting a reserved bit in either the page directory entry or page table entry [20].

Bit 4 of the error code indicates that the page fault was caused by an instruction fetch

[20]. These last two errors are very unlikely to occur in the target OS so they will not be

considered in the page-fault handler. Page faults caused by kernel-mode processes are

also unlikely to occur but if they do occur the kernel will panic. The CR2 register

contains the linear address that caused the page-fault.

The high-level flowchart, as show in Figure 10, shows the page-fault handler logic.

The page-fault exception number is 14 (0xE), so an interrupt gate for the page-fault exception handler must be added to entry 14 in the IDT. It should be noted that this implementation of a page-fault handler is a highly simplified one and does not make use of any page-replacement algorithms. Implementing a page-replacement algorithm is possible, but there has to be a hard disk running on the target system to read/write pages to disk (swap space). A CSUS graduate student named Jayaram Durairaj implemented a disk device driver for the target OS [23], so it is theoretically possible to write a page-fault handler that utilizes the hard drive for page replacement. However, implementing a sophisticated page-fault handler that supports page replacement is outside the scope of this project.

52

Page Fault Occurred

PF occurred in No user-mode?

Yes

PF caused Set present bit in No Process needs an No by protection page table for non- additional page? violation? present page

Yes Yes

Kill the process that caused the PF One or more No pages are free?

Yes

Allocate a free page to process

Exit Page Fault Handler

Figure 10. Flowchart for Page-Fault Handler

53

5.9 General-Protection Fault Handler Implementation

General-protection faults are usually caused by a privilege or protection violation.

Some examples include a user-mode process loading the CS register with a ring 0 code segment or executing a privileged instruction in user-mode. When a general-protection fault occurs due to a segment or interrupt gate descriptor privilege violation, the error code format indicates the table (GDT, LDT or IDT) and selector index of the descriptor that caused the fault. For all other types of general-protection faults, the error code is 0

[21]. For this project, general-protection faults caused by a segment or interrupt gate descriptor privilege violation will cause the kernel to panic. Any other type of general- protection fault will cause the process to be terminated by the kernel. A general- protection fault handler must be added to entry number 13 (0xD) in the IDT. Figure 11 shows the general-protection fault handler logic.

Figure 11. Flowchart for General-Protection Fault Handler

54

Chapter 6

TESTING

This chapter covers the testing approach used to verify that the software meets the requirements. It also contains a section that lists the test cases and the expected results for each test case. Chapter 7 will evaluate the test results.

6.1 Testing Approach

In order to test whether the software met the project requirements, it was essential to run various tests. One of the most important tests was to verify that the CPU mode switches to user-mode after dispatching a user-mode process. Another important test was to verify that the CPU mode switches to kernel-mode after a software or hardware interrupt. I used GDB to examine the state of the register file after privilege ring changes.

I initially tested this with paging disabled and then with paging enabled. The reason for this is because paging requires code . When the code is relocated, GDB loses the line number information which makes it more difficult to break into GDB using breakpoints. With paging enabled, the workaround I used is to examine the contents of the user-mode process’s trap frame by setting a breakpoint within the kernel. The trap frame contains the state of the segment registers which can be used to determine the CPU mode just prior to entering the kernel. Another technique I used is to add code to the user- mode process that generates a divide-by-zero exception. Since there is no divide-by-zero exception handler, a segmentation fault occurs when the exception is raised and

55

execution halts in the user-mode process. I could then see the state of the registers using

GDB’s ‘i r’ command.

Some additional tests include writing code to intentionally invoke the page-fault and general-protection fault handler routines to execute the various code paths within the

exception handlers. For the page-fault handler, this includes causing a page-fault due to

both valid and invalid memory accesses in user-mode and causing a page-fault due to an

invalid memory access in kernel-mode. Additional tests were designed to test the scenario that all memory is allocated to processes and no free pages exist. For the general-protection fault handler, I created a test to verify that the process is terminated if

it attempts to execute a privileged instruction. I also created a test to cause the kernel to

panic in the general-protection fault handler because of a descriptor privilege violation.

Additional scenarios that were tested to ensure the system is robust include testing

the system with multiple combinations of user-mode and kernel-mode processes

executing simultaneously. These test cases were designed to ensure that there are no

conflicts among the running processes on the system. Running many different processes

simultaneously can result in resource contention issues. The target systems in the CSC

159 lab contain 64 MB of memory so it is not likely that the system would run out of

memory. Each page is 4 KB and the maximum number of pages in the system is set to

100 so the maximum amount of memory that could be consumed by processes is 100 * 4

KB = 400 KB. The size of the target OS executable image is not likely to exceed about

120 KB so the total memory consumed by processes and the kernel is not likely to exceed

56

about 520 KB. However, it is still worth testing because as memory pressure increases

the probability of errors, exceptions or a system increases.

6.2 Test Cases

Table 5 lists the test cases I used to verify that the system is robust and meets the requirements of the project.

Table 5. Test Cases to Verify the Software Requirements

Test Test Case Description Expected Results Case #

Test that the ProcLoader() function is The user-mode process should able to dispatch a user-mode process begin executing in user-mode 1 (prerequisite: a user-mode process must without any errors, exceptions, be created with the faults or crashes. NewUserProcHandler() function).

Test that the ProcLoader() function is The kernel-mode process should able to dispatch a kernel-mode process begin executing in kernel-mode 2 (prerequisite: a kernel-mode process without any errors, exceptions, must be created with the faults or crashes. NewKernelProcHandler() function).

The CPU mode should switch Test the state of the system when a user- from user-mode to kernel-mode 3 mode process invokes a system call. without any errors, exceptions, faults or crashes. Test the state of the system when a The CPU mode should switch hardware interrupt occurs while a user- from user-mode to kernel-mode 4 mode process is running (e.g. timer without any errors, exceptions, interrupt). faults or crashes.

57

Intentionally cause a page-fault in user- The present bit should be set in mode due to a valid memory access. In 5 the page table entry for the non- this case, the process does not need an present page. additional page.

Intentionally cause a page-fault in user- A free page should be allocated mode due to a valid memory access. In to the process in the proper page 6 this case, the process needs an table entry corresponding to the additional page and one or more free address that caused the page pages are available in the system. fault.

Intentionally cause a page-fault in user- A kernel panic should occur with mode due to a valid memory access. In a message indicating that no free 7 this case, the process needs an pages are available in the additional page and there are no free system. pages available in the system. The user-mode process should Intentionally cause a page-fault while in be terminated and another 8 user-mode due to an invalid memory process should be selected to access run. Intentionally cause a page-fault while in A kernel panic should occur with 9 kernel-mode due to an invalid memory a message indicating that a page access fault occurred in kernel-mode.

The user-mode process should Intentionally cause a general-protection be terminated and another 10 fault in user-mode by executing a process should be selected to privileged instruction run.

Intentionally cause a general-protection A kernel panic should occur with fault in user-mode by loading the DS a message indicating the general 11 register with 0x13 (a kernel-mode data protection fault error code (error segment selector with RPL = 3) prior to code should be 0x10). dispatching a user-mode process.

58

Create 15 or more user-mode processes There should be no errors, 12 on the system (maximum total number exceptions, faults or crashes of processes is 20)

Create 15 or more kernel-mode There should be no errors, 13 processes on the system (maximum total exceptions, faults or crashes number of processes is 20)

Create 8 user-mode processes and 7 or There should be no errors, 14 more kernel-mode processes (maximum exceptions, faults or crashes total number of processes is 20)

For most of the test cases listed above, it was very straightforward to set up and

execute the test case. However, there are a few test cases where it may not be obvious as

to how it could be set up and run. For example, to set up the test cases that require a page- fault to be generated by a valid memory access, page tables were set up for the process with all the proper permissions except for the present bit which was set to 0. When the present bit is cleared in the page table entry, it means that the page is not in main memory. When the process tries to access the page, a page-fault will be generated with an error code indicating that a valid memory request was made for a page that is not present

(bit 0 will be cleared in the error code) [20]. For the test case where a page-fault must be generated by an invalid memory access in user-mode, this can be accomplished by trying to read or write a kernel data structure such as the run_pid, run_q, array of PCB’s, etc.

This method can only be used with user-mode processes because kernel-mode processes should be able to directly access and modify kernel data structures. For invalid memory

59

accesses, bit 0 in the error code will be set to indicate a protection violation [20]. For the test case where a page-fault must be generated by an invalid memory access in kernel- mode, this can be accomplished by assigning EIP a virtual memory address that does not have page tables set up.

60

Chapter 7

RESULTS AND EVALUATION

This chapter provides the results of the tests listed in the previous chapter. It also

provides an evaluation of the test results and the system as a whole. All the test cases

passed, however, it was discovered that GDB is reporting stale values in the ESP and SS

registers after switching to user-mode (test case # 1). I was able to verify that the contents

of the ESP and SS registers contained the expected values. The way I achieved this is by

adding the following inline assembly instructions to the user-mode process: ‘asm(“movl

$0, %ecx”); asm(“movl $0, %edx”); asm(“movl %esp, %ecx); asm(“movw %ss, %dx);’.

These instructions essentially clear the ECX and EDX registers and then copy the values in the ESP and SS registers to the ECX and DX registers, respectively. At this point, I used GDB’s ‘i r’ command to view the state of the registers and the values in the ECX and EDX registers were the expected ESP and SS values while the values in ESP and SS where stale values from before the user-mode process was dispatched (see Figure 12 on the next page). In Figure 12, I placed red boxes around the ESP and SS registers which contain stale values. I placed a green box around the ECX and EDX registers which contain the actual values in the ESP and SS registers, respectively. In this figure, you might notice that EIP does not appear to contain a virtual memory address. This is because I disabled paging and assigned the physical memory address of the UserProc1() function to EIP. I did this so that I could set a breakpoint in the user-mode process and see the state of the register file using GDB.

61

While it is difficult to be certain, I am confident that a bug in GDB is causing this issue. Since GDB is set up to connect to the target system over a remote RS-232 serial connection, bugs like this can occur because of timing issues. Each time the state of the target changes, the change(s) must be propagated back to the host system so GDB can update its internal state. The communication between the GDB host and target system are over a serial connection which may cause intolerable transmission delays during CPU mode-switches (this has not been verified nor confirmed, it’s just a prediction). Note that this was tested with one of the latest versions of GDB (version 7.12.1) released in

January 2017 [24].

Figure 12. State of the Register File after Switching to User-Mode

62

The other test cases all passed without any issues. During the different page-fault and general-protection fault scenarios, the system responded with the expected behavior.

Overall, there were no performance issues seen in the system and no other issues were discovered during testing. The system was able to handle running up to 20 processes simultaneously without any issues.

63

Chapter 8

CONCLUSION AND FUTURE WORK

This project was successful at meeting all the objectives defined in the preliminary

stages of the project. For the most part, the privilege ring transitions went smoothly once

the register file and kernel data structures were set up properly. It took a lot of trial and

error to find the proper state of the segment descriptors, TSS, trap frame, page tables,

etc., so that the CPU mode-switching could be accomplished without any issues. The

work done in this project makes the target OS function more like a typical OS where the

CPU mode switches to user-mode when running processes and runs in kernel-mode in the

kernel.

At some point in the future, it is expected that there will be a user-mode

programming assignment phase added to CSC 159 that incorporates the work done in this

project. Some possibilities for the assignment include implementing the page-fault and/or

general-protection fault handlers from scratch using the same or different algorithms,

adding synchronization mechanisms to allow user-mode processes to share memory, etc.

There are countless ways that this work can be expanded in the future. A couple of ideas I

have are: adding a thread library to support multi-threaded user-mode

processes and adding user space library functions such as malloc() and free() that invoke

a sbrk() system call to dynamically allocate/deallocate memory on the heap.

To further improve the system, it would be helpful to find a way to break into GDB

in the user-mode process. As was mentioned in chapter 6, when the code is relocated

(copied to the instruction page), the line number information is lost and it is difficult (if

64

not impossible) to set a breakpoint and break into GDB in the user-mode process. One

method I tried is to set breakpoints on the physical addresses in the instruction page,

however, this method did not work.

It would also be helpful to get a better understanding on why GDB is reporting stale

values in the ESP and SS registers after switching to user-mode. If it were determined to

be a bug in GDB, then it would be helpful to get the issue fixed by the GDB maintainers

so the registers show the correct values after switching to user-mode. Not being able to

see ESP’s actual value from within a user-mode process is limiting because you cannot tell where the stack pointer is pointing.

It would also be interesting to test running user-mode processes in other CSC 159 phases to see if there are any compatibility issues. Furthermore, as mentioned in Chapter

4, it would be useful to implement the Fork() and Exec() system calls to work with user- mode processes.

65

Appendix A

SOURCE CODE

Note: In order to maintain the integrity of the CSUS CSC 159 programming assignment phases, not all of the source code can be disseminated in this report. Some portions of the source code are in pseudo code and other portions of the code are intentionally omitted.

File: types.h

/* Kernel data structures and constants for process and memory management */

#ifndef _TYPES_H_ #define _TYPES_H_

#define PAGE_NUM 100 /* maximum number of pages */ #define PROC_NUM 20 /* maximum number of processes */ #define PAGE_SIZE 4096 /* page size is 4 KB */ #define GEN_PROT_FAULT_EVENT 13 /* GP fault exception number */ #define PAGE_FAULT_EVENT 14 /* PF exception number */ #define EXCEPTION_EVENT 100 /* exception event code */ #define SYSCALL_EVENT 128 /* system call event code */ #define PROC_STACK_SIZE 4096 /* process stack size is 4 KB */ #define PF_ERR_PRES_BIT 1 /* page present bit position in PF error code */ #define PF_ERR_USER_SUP_BIT 4 /* user/supervisor bit position in PF error code */ #define TSS_PRESENT 8 /* TSS present bit position (used in TSS Desc) */

. . . typedef enum {KMODE_PROC, UMODE_PROC} proc_type_t; typedef struct /* process trap frame */ { unsigned short gs; /* segment registers */ unsigned short _notgs; unsigned short fs; unsigned short _notfs; unsigned short es; unsigned short _notes; unsigned short ds; unsigned short _notds; unsigned int EDI; /* general-purpose registers */

66

unsigned int ESI; unsigned int EBP; unsigned int ESP; unsigned int EBX; unsigned int EDX; unsigned int ECX; unsigned int EAX; unsigned int event_type; /* system call or exception event */ unsigned int error_code; /* error code for exception handlers */ unsigned int EIP; unsigned int CS; unsigned int EFL; unsigned int user_esp; /* user-mode stack pointer */ unsigned int user_ss; /* user-mode stack segment */ } proc_frame_t; typedef struct /* process control block */ { proc_type_t proc_type; state_t state; /* state of process (e.g. RUN, READY, etc.) */ . . . proc_frame_t *proc_frame_p; /* pointer to process’s trap frame */ } pcb_t; typedef struct { /* page descriptor */ int addr; /* address of the page */ int pid; /* PID of the process that owns the page */ } page_t;

#endif

File: data.h

/* Extern declarations of kernel data structures for process and memory management */

#ifndef _DATA_H_ #define _DATA_H_

#include “types.h”

. . .

67

extern int run_pid; extern pcb_t pcb[PROC_NUM]; extern char proc_kernel_stack[PROC_NUM][PROC_STACK_SIZE]; extern int kernel_cr3; extern page_t page[PAGE_NUM];

#endif

File: events.h

/* Function prototypes for process dispatching, event handling, loading the Task Register, etc. */

#ifndef _EVENTS_H_ #define _EVENTS_H_

#ifndef ASSEMBLER

__BEGIN_DECLS

. . .

void ProcLoader(proc_frame_t *); void LoadKernelTss(void); void SyscallEvent(void) void PageFaultEvent(void); void GenProtFaultEvent(void);

__END_DECLS

#endif

#endif

File: handlers.h

/* Interrupt and exception handler declarations */

#ifndef _HANDLERS_H_ #define _HANDLERS_H_

#include “types.h”

68

. . .

void PageFaultHandler(void); void GenProtFaultHandler(void); void NewUserProcHandler(func_p_t); void NewKernelProcHandler(func_p_t);

#endif

File: proc.h

/* Process function declarations */

#ifndef _PROC_H_ #define _PROC_H_ extern void SystemProc(void); extern void UserProc1(void); extern void UserProc2(void); extern void UserProc3(void);

#endif

File: handlers.c

/* Interrupt and exception handler definitions */

#include “spede.h” #include “types.h” #include “data.h” #include “tools.h” #include “proc.h” #include “handlers.h”

void GenProtFaultHandler(void) { unsigned error_code = pcb[run_pid].proc_frame_p->error_code;

if(error_code != 0) /* GPF was caused by a descriptor privilege violation */ { cons_printf(“Kernel Panic in GPF handler: general protection fault!

69

Error code 0x%x\n”, error_code); breakpoint(); } else { /* terminate the process here*/

cons_printf(“GPF handler: process ID %d was terminated for a protection violation\n”, run_pid); breakpoint(); } } void PageFaultHandler(void) { unsigned error_code = pcb[run_pid].proc_frame_p->error_code;

if(error_code & PF_ERR_USER_SUP_BIT) /* PF occurred in ring 3 */ { if((error_code & PF_ERR_PRES_BIT) == 0) /* PF not caused by a { protection violation */ /* if(process does not need an additional page) set present bit in page table for non-present page else { find a free page if(no free pages available) { cons_printf(“Kernel Panic in PF Handler: No free pages!\n”); breakpoint(); } allocate free page to process } */ } else /* Page Fault caused by a protection violation */ { /* terminate the process here */

cons_printf(“PF handler: process ID %d was terminated for a protection violation\n”, run_pid); breakpoint(); } }

70

else / * Page Fault occurred in ring 0 */ { cons_printf(“Kernel Panic in PF Handler: PF occurred in ring 0 by process ID %d\n”, run_pid); breakpoint(); } }

void NewUserProcHandler(func_pt_t p) { /* allocate PID for new user-mode process change process state to RUN enqueue PID onto the run queue point proc_frame_p into the top of the proc_kernel_stack - sizeof(proc_frame_t) */

pcb[pid].proc_type = UMODE_PROC; pcb[pid].proc_frame_p->ds = 0x3B; /* build trap frame */ pcb[pid].proc_frame_p->es = 0x3B; pcb[pid].proc_frame_p->fs = 0x3B; pcb[pid].proc_frame_p->gs = 0x3B; pcb[pid].proc_frame_p->user_ss = 0x43; /* user-mode stack segment */

/* user-mode stack segment points to virtual address 2 GB - sizeof(TF) */ pcb[pid].proc_frame_p->user_esp = 0x80000000 – sizeof(proc_frame_t); pcb[pid].proc_frame_p->EFL = get_eflags() | EF_INTR; /*enable interrupts*/ pcb[pid].proc_frame_p->CS = 0x33; pcb[pid].proc_frame_p->EIP = 0x40000000; */ 1 GB virtual address for text section */

/* allocate five free pages to new process copy proc_frame to the top of the stack page copy the new process’s code to the instruction page build page tables set the cr3 field in the new process’s PCB to the base of the page directory */ }

void NewKernelProcHandler(func_pt_t p) { /* allocate PID for new kernel-mode process change process state to RUN enqueue PID onto the run queue point proc_frame_p into the top of the proc_kernel_stack -

71

sizeof(TF) */

pcb[pid].proc_type = KMODE_PROC; /* build trap frame */ pcb[pid].proc_frame_p->ds = get_ds(); /* get kernel’s data seg selectors */ pcb[pid].proc_frame_p->es = get_es(); pcb[pid].proc_frame_p->fs = get_fs(); pcb[pid].proc_frame_p->gs = get_gs(); pcb[pid].proc_frame_p->EFL = get_eflags() | EF_INTR; /* enable interrupts*/ pcb[pid].proc_frame_p->CS = get_cs(); /* use kernel’s CS selector */ pcb[pid].proc_frame_p->EIP = (unsigned)p; /* func pointer to kernel proc function */

/* set the cr3 field in the new process’s PCB to kernel_cr3 */ }

File: events.S

/* Function definitions for process dispatching, event handling, loading the Task Register, etc. */

#include

.text

ENTRY(ProcLoader) movl 4(%esp), %ebx /* set esp to point to the base of the TF */ movl %ebx, %esp popl %gs /* pop data segment registers from TF */ popl %fs popl %es popl %ds popa /* pop general-purpose registers from TF */ addl $8, %esp /* skip over event_type and error_code in TF */ iret

ENTRY(SyscallEvent) pushl $0 /* dummy error_code */ pushl $128 /* system call event type */ jmp KernelEntry

ENTRY(GenProtFaultEvent) pushl $100 /* exception event */

72

movl $13, %ecx /* GEN_PROT_FAULT_EVENT */ jmp KernelEntry

ENTRY(PageFaultEvent) pushl $100 /* exception event */ movl $14, %ecx /* PAGE_FAULT_EVENT */ jmp KernelEntry

ENTRY(LoadKernelTss) movw $0x4B, %ax /* loads entry 9 from GDT (Kernel TSS Desc) */ ltr %ax ret

KernelEntry: pusha /* store general-purpose registers in TF */ pushl %ds /* store segment registers in TF */ pushl %es pushl %fs pushl %gs movl %esp, %ebx movw $0x10, %ax /* set data seg registers to kernel-mode selectors */ movw %ax, %ds movw %ax, %es movw %ax, %fs movw %ax, gs pushl %ebx /* push pointer to TF onto the stack for Kernel() function */ cld call CNAME(Kernel)

File: main.c

/* The main() function is the entry point of the OS. In this function, kernel data structures are initialized. */

#include “spede.h” #include “events.h” #include “proc.h” #include “handlers.h” #include “tools.h” #include “types.h”

char proc_kernel_stack[PROC_NUM][PROC_STACK_SIZE];

73

int run_pid; /* currently running process’s PID */ pcb_t pcb[PROC_NUM]; /* array of process control blocks */ page_t page[PAGE_NUM]; /* array of page descriptors */ int kernel_cr3; /* contains kernel’s CR3 register value */

. . . struct i386_tss kernel_tss; /* Kernel TSS */ struct i386_descriptor *GDT_p; /* pointer to Global Descriptor Table */ int main() { unsigned kernel_tss_limit; /* size of Kernel TSS */ struct pseudo_descriptor pseudo_desc; /* used to get base addr of GDT */ struct i386_gate *IDT_p; /* pointer to Interrupt Descriptor Table */

GDT_p = NULL; get_gdt(&pseudo_desc); /* get address of GDT from pseudo descriptor */ GDT_p = (struct i386_descriptor *) pseudo_desc.linear_base;

if(GDT_p == NULL) { cons_printf(“Kernel Panic: GDT_p is NULL!\n”); return 0; }

kernel_tss.tss_ss0 = 0x18; /* Build Kernel TSS */ kernel_tss.tss_cs = 0x8; kernel_tss.tss_ds = 0x10; kernel_tss.tss_es = 0x10; kernel_tss.tss_fs = 0x10; kernel_tss.tss_gs = 0x10; kernel_tss.tss_ss = 0x18;

kernel_tss_limit = sizeof(kernel_tss);

/* Add Kernel TSS Descriptor to GDT */ fill_descriptor(&GDT_p[9], (unsigned)&kernel_tss, kernel_tss_limit, TSS_PRESENT | ACC_TSS, 0x0);

LoadKernelTss(); /* Load Kernel TSS into Task Register */

. . .

74

IDT_p = get_idt_base();

/* Add Interrupt/Exception handler gates to the IDT */

fill_gate(&IDT_p[GEN_PROT_FAULT_EVENT], (int)GenProtFaultEvent, get_cs(), ACC_INTR_GATE, 0);

fill_gate(&IDT_p[PAGE_FAULT_EVENT], (int)PageFaultEvent, get_cs(), ACC_INTR_GATE, 0);

fill_gate(&IDT_p[SYSCALL_EVENT], (int)SyscallEvent, get_cs(), ACC_INTR_GATE | ACC_PL_U, 0);

. . .

kernel_cr3 = get_cr3(); /* get kernel’s cr3 value */

NewKernelProcHandler(SystemProc); /* create kernel-mode process */ ProcScheduler(); /* select a process to run (will update run_pid) */

/* set TSS_ESP0 in Kernel TSS to process’s kernel-mode stack */ kernel_tss.tss_esp0 = (unsigned)&proc_kernel_stack[run_pid][PROC_STACK_SIZE];

/* dispatch process */ ProcLoader(pcb[run_pid].proc_frame_p);

return 0; }

/* In the Kernel() function, the appropriate handler routine is invoked based on the event type. */ void Kernel(proc_frame_t *proc_frame_p) { . . .

if(pcb_frame_p->event_type == SYSCALL_EVENT) /* handle system call */ { . . . } else if(pcb_frame_p->event_type == EXCEPTION_EVENT) { if(proc_frame_p->ECX == PAGE_FAULT_EVENT)

75

PageFaultHandler(); else if(proc_frame_p->ECX == GEN_PROT_FAULT_EVENT) GenProtFaultHandler(); }

. . .

/* if(key was pressed on the target keyboard) { if(key == ‘n’) NewUserProcHandler(UserProc1); . . . } */

ProcScheduler(); /* select a process to run (will update run_pid) */

/* set ESP0 in Kernel TSS to process’s kernel-mode stack */ kernel_tss.tss_esp0 = (unsigned)&proc_kernel_stack[run_pid][PROC_STACK_SIZE];

/* dispatch process */ ProcLoader(pcb[run_pid].proc_frame_p); }

File: proc.c

/* Process function definitions */

#include “spede.h” #include “proc.h” #include “syscalls.h” #include “tools.h” #include “data.h” void SystemProc(void) { . . . } void UserProc1(void) { . . . } void UserProc2(void) { . . . } void UserProc3(void) { . . . }

76

References

[1] L. Das, The x86 . New Delhi, India: Dorling Kindersley, 2010, p. 213.

[2] B. Witt and J. Clevenger, "SPEDE Lab Manual: Chapter 13 – User Mode Tasks (draft)", athena.ecs.csus.edu, 2018. [Online]. Available: http://athena.ecs.csus.edu/~changw/159/UserMode.pdf. [Accessed: 23-Jan-2018].

[3] “X86”, en.wikipedia.org, 2018. [Online]. Available: https://en.wikipedia.org/wiki/X86. [Accessed: 11-Feb-2018].

[4] “X86 Assembly/Protected Mode – Wikibooks, open books for an open world”, en.wikibooks.org, 2018. [Online]. Available: https://en.wikibooks.org/wiki/X86_Assembly/Protected_Mode. [Accessed: 11-Feb- 2018].

[5] Intel 64 and IA-32 Architectures Software Developer’s Manual (Volume 1: Basic Architecture). December 2017, Chapter 3 Page 9.

[6] Intel 64 and IA-32 Architectures Software Developer’s Manual (Volume 1: Basic Architecture). December 2017, Chapter 2 Page 1.

[7] “Global Descriptor Table”, en.wikipedia.org, 2018. [Online]. Available: https://en.wikipedia.org/wiki/Global_Descriptor_Table. [Accessed: 07-Feb-2018].

[8] “Interrupt descriptor table”, en.wikipedia.org, 2018. [Online]. Available: http://en.wikipedia.org/wiki/Interrupt_descriptor_table. [Accessed: 03-Mar-2018].

[9] Intel 64 and IA-32 Architectures Software Developer’s Manual (Volume 3: System Programming Guide). December 2017, Chapter 5 Page 8.

[10] Software.intel.com, 2018. [Online]. Available: https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd- 3abcd.pdf [Accessed: 03-Apr-2018].

[11] “Protection ring”, en.wikipedia.org, 2018. [Online]. Available: https://en.wikipedia.org/wiki/Protection_ring. [Accessed: 14-Feb-2018].

[12] Intel 64 and IA-32 Architectures Software Developer’s Manual (Volume 3: System Programming Guide). December 2017, Figure 7-2 on Chapter 7 Page 4.

[13] T. Shanley, Protected Mode Software Architecture. Boston, [Mass.]: Addison- Wiley, 1996, pp. 158-159.

77

[14] M. Posch, Mastering C++ Multithreading. Birmingham: Packt Publishing, 2017, p. 21.

[15] “Context switch”, en.wikipedia.org, 2018. [Online]. Available: https://en.wikipedia.org/wiki/Context_switch#Hardware_vs._software. [Accessed: 08- Apr-2018].

[16] “Paging – OSDev Wiki”, wiki.osdev.org, 2018. [Online]. Available: https://wiki.osdev.org/Paging. [Accessed: 01-Mar-2018].

[17] “80386 Programmer’s Reference Manual – Section 9.6”, pdos.csail.mit.edu, 2018. [Online]. Available: https://pdos.csail.mit.edu/6.8.28/2010/readings/i386/s09_06.htm. [Accessed: 23-Mar-2018].

[18] J. Clevenger and B. Witt, SPEDE-2000 Lab Manual for Computer Science / Computer Engineering 159 On the Pentium Platform. Sacramento: CSUS Foundation, 2010.

[19] “System Idle Process”, en.wikipedia.org, 2018. [Online]. Available: https://en.wikipedia.org/wiki/System_Idle_Process. [Accessed: 28-Mar-2018].

[20] “Exceptions – OSDev Wiki”, wiki.osdev.org, 2018. [Online]. Available: https://wiki.osdev.org/Exceptions#Page_Fault. [Accessed: 23-Mar-2018].

[21] “Exceptions – OSDev Wiki”, wiki.osdev.org, 2018. [Online]. Available: https://wiki.osdev.org/Exceptions#General_Protection_Fault. [Accessed: 23-Mar-2018].

[22] “Exceptions – OSDev Wiki”, wiki.osdev.org, 2018. [Online]. Available: https://wiki.osdev.org/Exceptions. [Accessed: 27-Mar-2018].

[23] J.Durairaj, “File Manager: Disk Driver Interface for Operating System Pragmatics CSC 159” (Master’s Project Report), California State University, Sacramento, 2013.

[24] “Index of //gdb”, ftp.gnu.org, 2018. [Online]. Available: https://ftp.gnu.org/gnu/gdb/. [Accessed: 08-Apr-2018].