An X86 Protected Mode Virtual Machine Monitor for the MIT Exokernel by Charles L
Total Page:16
File Type:pdf, Size:1020Kb
An x86 Protected Mode Virtual Machine Monitor for the MIT Exokernel by Charles L. Coffing Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degrees of Bachelor of Science in Computer Science and Engineering and Master of Engineering in Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 1999 © Charles L. Coffing, NICMXCIX. A rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis and to grant others the right to do so. MASSACHUSETTS NE OF TECHN 0 Author .............................. ... ......... Department of Electrical Engineering Computer Science A/- May 21, 1999 Certified by ............. ...................Fsh M. Frans Kaashoek Associate otessor z hes " ervisor Accepted by....... ........... Arthur C. Smith Chairman, Department Committee on Graduate Students An x86 Protected Mode Virtual Machine Monitor for the MIT Exokernel by Charles L. Coffing Submitted to the Department of Electrical Engineering and Computer Science on May 21, 1999, in partial fulfillment of the requirements for the degrees of Bachelor of Science in Computer Science and Engineering and Master of Engineering in Computer Science Abstract This thesis presents the design and implementation of an x86 virtual machine mon- itor that allows multiple operating systems to run concurrently under xok, MIT's x86-based exokernel. The monitor and modified xok demonstrate how to share sev- eral processor-defined structures, such as the GDT, LDT, and IDT, between multiple operating systems. Xok was modified to allow controlled modifications to these struc- tures. Linux was used as the reference guest operating system. The guest operating system is part of the trusted code base, but the monitor itself is not. Xok, the moni- tor, and the guest operating system coexist on the same pagetable to minimize TLB flushes. Real-mode virtualizations of disk and video devices were done. The resulting monitor was benchmarked against several other solutions, and thoughts for future improvements are presented. Thesis Supervisor: M. Frans Kaashoek Title: Associate Professor 2 Contents 1 Introduction 10 1.1 Background ..... ..... ..... ..... ..... ..... 10 1.2 Terminology ... .... .... .... .... ..... .... .... 11 1.3 Design strategy .. .... .... .... .... .... ..... ... 12 1.4 Thesis organization .. ..... ..... ..... ..... ..... 13 2 Related Work 14 2.1 Full emulation ... .... .... .... .... .... ..... ... 14 2.2 Binary emulation ... ..... ..... ..... ..... ..... 15 2.3 Virtual machine monitor .... ..... ..... ..... ..... 15 3 x86 Background 18 3.1 History .... ..... ..... ..... ..... ..... ..... 18 3.2 Operating modes ..... ..... ..... ...... ..... ... 19 3.3 Segmentation . ...... ...... ..... ...... ...... 20 3.4 Paging .. .... .... .... .... .... ..... .... .... 21 3.5 Privilege levels ... ..... ..... ..... ..... ..... .. 22 3.6 Registers ..... ...... ...... ....... ...... .... 22 3.7 Task management ..... ..... ..... ..... ..... .... 23 3.8 Interrupts and exceptions ..... ...... ...... ....... 23 4 Problems Virtualizing the x86 25 4.1 Privilege levels and segmentation ..... ...... ...... ... 25 3 4.2 Segment descriptor cache ... .... 26 4.3 Paging ..... ..... ...... 27 4.4 Untrappable instructions .. ..... 28 4.5 Summary of virtualization challenges 28 5 Design 30 5.1 Assumptions ............ .. 30 5.2 Solution overview ........... 31 5.3 Solution . .......... ..... 32 5.3.1 Emulation model ....... 32 5.3.2 Handler placement ...... 33 5.3.3 Segmentation ......... 33 5.3.4 Privilege levels ........ 34 5.4 Multiple guest operating systems 35 6 Memory Management 36 6.1 Requirements . ............ 36 6.1.1 Efficiency ........... ... .. .. ... .. .. 37 6.1.2 Speed ........... .. ... .. .. ... .. .. 3 7 6.1.3 Correctness ........ .. ... .. .. ... .. .. 3 8 6.2 D esign .......... ....... .. ... .. .. ... .. 3 9 6.2.1 Setup ........... .. ... .. .. ... .. .. 3 9 6.2.2 Page table management . .. ... .. .. ... .. .. 4 0 6.2.3 Changing page tables . .... .. ... ... .... ... 4 2 6.2.4 Page faults .. ........ .. ... .. .. ... .. 4 3 6.2.5 Tracking protected structures ... .. .. ... .. ... 4 3 6.2.6 Virtual address collisions . .. ..... ..... .... 4 4 6.3 Imperfect virtualization ... .... ... ... .. ... .. 4 8 6.3.1 Monitor location ... .... .. .. ... .. .. ... 4 8 6.3.2 Four megabyte pages . .... .. ... ... ... .. .. 4 9 6.3.3 Guests ......... .... .. ... .... ... ... 4 9 4 7 Segments 50 7.1 Xok segmentation .... ......... .... 50 7.2 Descriptor tables ............ ..... 51 7.2.1 GDT ... ............... .. 51 7.2.2 LD T .................... ........ ... 53 7.2.3 Trapping descriptor table modifications . ........ 55 7.2.4 Segment descriptor cache .. ....... 56 7.3 Imperfect virtualization .............. ... ........ 58 7.3.1 Descriptor tables ............. .. .... .... .. 58 7.3.2 Privilege levels and segmentation . ... ... ... .. ... 59 7.3.3 Segment descriptor cache ....... .. .... .... ... 6 1 8 Interrupts and Exceptions 63 8.1 Xok interrupt descriptor table ........ .. .. ... ... .. 63 8.2 Potential solutions to share the IDT ....... .... .... ... 64 8.2.1 Switching the IDT .......... .. .... .... ... 65 8.2.2 Multiplexing the IDT ........... .... .... ... 66 8.3 Multiplexing the IDT ............... ... ... .. ... 66 8.3.1 Catching guest exceptions and interrupts . .... .... ... 67 8.3.2 Handling exceptions in the monitor . .. .... .... ... 7 1 8.3.3 Emulating interrupt, task, and trap gates . .... .... ... 72 8.3.4 Returning to the guest .......... ... ... .. ... 73 8.3.5 Summary ................. .. .... .... .. 74 8.4 PIC virtualization .. ............... .. .. ... ... .. 75 8.5 Imperfect virtualization .............. .. .. .. .. .. 75 9 Device Virtualization 76 9.1 BIOS .... .... ... 76 9.2 Keyboard ... ..... 77 9.3 Video .......... 77 9.4 Disk .......... 78 5 10 Monitor Implementation 80 10.1 Exception handling .. 80 10.2 Compilation ...... 82 10.3 Configuration ..... 83 10.4 Debugger ....... 83 10.5 Usage . ........ 84 10.6 Status ...... ... 84 11 Xok 86 11.1 Helpful xok features .. ...... ..... 86 11.1.1 User defined trap handlers . .. .. 86 11.1.2 Batch system calls .. .. .. .... 86 11.2 Security of the modified xok . .. ..... 87 12 Performance 88 12.1 Benchmarking methodology 88 12.2 Benchmarks ....... .... 90 12.2.1 Computation: gzip . .. 90 12.2.2 Page table insertions: m iltiple pte 90 12.2.3 Page table insertions: sin gle pte 90 12.2.4 Comprehensive: boot .. 91 12.3 Results and Analysis ..... 91 12.3.1 Monitor invocation costs 94 13 Conclusion 97 13.1 Future work ...... ....... .... 97 13.1.1 Memory ..... .......... 97 13.1.2 Supported guest operating systems 99 13.1.3 Virtualized devices ..... .... 99 13.1.4 Hardware interrupts ........ 99 13.2 Conclusions ......... ........ 100 6 A Tables 101 B Figures 103 7 List of Figures 6-1 Virtual memory footprint of Linux overlaid on xok ...... ..... 47 B-1 Interrupt and exception handling from guest application using user- defined handlers. ..... ............ ........... 104 B-2 Interrupt and exception handling from guest application. ....... 105 B-3 Interrupt and exception handling from guest operating system. .... 106 B-4 Minimum cost of trapping from a guest operating system, emulating an instruction with xok system calls, and returning. .......... 107 8 List of Tables 2.1 Speed comparison of some emulation techniques ..... .. 15 5.1 Solution overview ....... ...... ....... .... 32 8.1 Xok interrupt descriptor table ....... ......... ..... 64 8.2 Gaining control of a guest interrupt or exception .. ..... 67 9.1 Data tables in the virtualized BIOS ......... ..... 77 10.1 Basic debugger commands .. ...... ....... .... 84 12.1 Speed comparison of four environments, in millions of cycles 92 12.2 Cost breakdown of monitor invocation, in cycles .. ..... 95 A.1 Summary of new xok system calls .... ....... .... 102 9 Chapter 1 Introduction 1.1 Background The paradigm of running exactly one operating system per personal computer is being threatened. The idea of running more than one operating system simultaneously is attracting interest, academically and commercially, because there now exists not only the power to do so but also the need. As Gordon Moore noted in a 1965 speech, the transistor counts and processing power of chips tend to double every 18 months. This trend, now known as Moore's Law, has held up well for thirty years and likely will continue in the foreseeable future. Additionally, in recent years the choice in PC operating systems has increased significantly beyond the mass-market standard of Windows. Multimedia-oriented operating systems are appearing and growing in importance, such as BeOS, which was first released on x86 in 1997. Linux, an open- source UNIX clone for personal computers, was first written in 1991 but had an explosion of popularity in 1998, with shipments growing 212% for the year in the server market, according to a study by International Data Corporation. In addition, numerous research operating systems are being or have been developed for the x86, each with their own advantages. There are numerous reasons for wanting to run multiple operating systems on a single PC. The most obvious reason is an economical one. As PCs become more powerful, an increasing number of uses do not fully tax the processor. Instead of 10 having a separate machine for each task, a new PC should be able to handle all tasks. Having the ability to simultaneously