
VU Research Portal Building a dependable operating system: fault tolerance in MINIX 3 Herder, J.N. 2010 document version Publisher's PDF, also known as Version of record Link to publication in VU Research Portal citation for published version (APA) Herder, J. N. (2010). Building a dependable operating system: fault tolerance in MINIX 3. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. E-mail address: [email protected] Download date: 02. Oct. 2021 BUILDING A DEPENDABLE OPERATING SYSTEM: FAULT TOLERANCE IN MINIX 3 VRIJE UNIVERSITEIT BUILDING A DEPENDABLE OPERATING SYSTEM: FAULT TOLERANCE IN MINIX 3 ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. L.M. Bouter, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de faculteit der Exacte Wetenschappen op donderdag 9 september 2010 om 13.45 uur in de aula van de universiteit, De Boelelaan 1105 door JORRIT NIEK HERDER geboren te Alkmaar promotor: prof.dr. A.S. Tanenbaum copromotor: dr.ir. H.J. Bos Advanced School for Computing and Imaging This work was carried out in the ASCI graduate school. ASCI dissertation series number 208. This research was supported by the Netherlands Organisation for Scientific Research (NWO) under project number 612-060-420. “We’re getting bloated and huge. Yes, it’s a problem. ... I’d like to say we have a plan.” Linus Torvalds on the Linux kernel, 2009 Copyright © 2010 by Jorrit N. Herder ISBN 978-94-6108-058-5 Parts of this thesis have been published before: ACM SIGOPS Operating System Review, 40(1) and 40(3) USENIX ;login:, 31(2), 32(1), and 35(3) IEEE Computer, 39(5) Springer Lecture Notes in Computer Science, 4186 Springer Real-Time Systems, 43(2) Proc. 6th European Dependable Computing Conf. (EDCC’06) Proc. 37th IEEE/IFIP Int’l Conf. on Dependable Systems and Networks (DSN’07) Proc. 14th IEEE Pacific Rim Int’l Symp. on Dependable Computing (PRDC’08) Proc. 39th IEEE/IFIP Int’l Conf. on Dependable Systems and Networks (DSN’09) Proc. 4th Latin-American Symp. on Dependable Computing (LADC’09) Contents ACKNOWLEDGEMENTS xiii SAMENVATTING xv 1 GENERAL INTRODUCTION 1 1.1 TheNeedforDependability . 2 1.2 TheProblemwithDeviceDrivers . 4 1.3 WhydoSystemsCrash? ........................ 6 1.3.1 SoftwareComplexity. 6 1.3.2 DesignFlaws.......................... 8 1.4 ImprovingOSDependability. 9 1.4.1 AModularOSDesign . 10 1.4.2 Fault-toleranceStrategies . 12 1.4.3 OtherBenefitsofModularity. 13 1.5 PreviewofRelatedWork . 14 1.6 FocusofthisThesis........................... 16 1.7 OutlineofthisThesis. 18 2 ARCHITECTURAL OVERVIEW 19 2.1 TheMINIX OperatingSystem ..................... 19 2.1.1 HistoricalPerspective . 19 2.1.2 MultiserverOSStructure. 21 2.1.3 InterprocessCommunication. 22 2.2 DriverManagement........................... 24 2.3 IsolatingFaultyDrivers. 26 2.3.1 IsolationArchitecture . 26 2.3.2 HardwareConsiderations . 30 2.4 RecoveringFailedDrivers . 32 2.4.1 DefectDetectionandRepair . 32 2.4.2 AssumptionsandLimitations . 34 2.5 FaultandFailureModel . 35 vii viii CONTENTS 3 FAULT ISOLATION 37 3.1 IsolationPrinciples ........................... 37 3.1.1 ThePrincipleofLeastAuthority. 37 3.1.2 ClassificationofPrivilegedOperations . 38 3.1.3 GeneralRulesforIsolation. 41 3.2 User-level Driver Framework . 42 3.2.1 MovingDriverstoUserLevel . 42 3.2.2 SupportingUser-levelDrivers . 43 3.3 IsolationTechniques . 44 3.3.1 RestrictingCPUUsage. 44 3.3.2 RestrictingMemoryAccess . 45 3.3.3 RestrictingDeviceI/O . 50 3.3.4 RestrictingIPC ......................... 51 3.4 CaseStudy:LivinginIsolation. 54 4 FAILURE RESILIENCE 57 4.1 DefectDetectionTechniques . 57 4.1.1 UnexpectedProcessExits . 58 4.1.2 PeriodicStatusMonitoring. 58 4.1.3 ExplicitUpdateRequests . 59 4.2 On-the-flyRepair ............................ 60 4.2.1 RecoveryScripts ........................ 60 4.2.2 RestartingFailedComponents . 61 4.2.3 StateManagement . 63 4.3 EffectivenessofRecovery . 65 4.3.1 RecoveringDeviceDrivers. 66 4.3.2 RecoveringSystemServers . 70 4.4 CaseStudy:MonitoringDriverCorrectness . 70 4.5 CaseStudy:AutomatingServerRecovery . 73 5 EXPERIMENTAL EVALUATION 75 5.1 Software-implementedFaultInjection . 75 5.1.1 SWIFITestMethodology . 75 5.1.2 Network-deviceDriverResults . 79 5.1.3 Block-deviceDriverResults . 85 5.1.4 Character-device Driver Results . 87 5.2 PerformanceMeasurements . 89 5.2.1 CostsofFaultIsolation. 89 5.2.2 CostsofFailureResilience. 94 5.3 Source-codeAnalysis. 96 5.3.1 Evolution of MINIX 3 ..................... 96 5.3.2 EvolutionofLinux2.6 . 99 CONTENTS ix 6 RELATED WORK 101 6.1 In-kernelSandboxing. .101 6.1.1 Hardware-enforced Protection . 102 6.1.2 Software-basedIsolation . .104 6.2 VirtualizationTechniques. .107 6.2.1 FullVirtualization . .107 6.2.2 Paravirtualization. .109 6.3 FormalMethods.............................111 6.3.1 Language-basedProtection . .112 6.3.2 DriverSynthesis . .115 6.4 User-levelFrameworks . .117 6.4.1 ProcessEncapsulation . .117 6.4.2 Split-driverArchitectures . .120 6.5 Comparison...............................123 7 SUMMARY AND CONCLUSION 125 7.1 SummaryofthisThesis. .125 7.1.1 ProblemStatementandApproach . 125 7.1.2 Fault-toleranceTechniques. 128 7.2 LessonsLearned ............................130 7.2.1 DependabilityChallenges . .131 7.2.2 PerformancePerspective . .132 7.2.3 EngineeringEffort . .133 7.3 Epilogue.................................135 7.3.1 ContributionofthisThesis . .135 7.3.2 ApplicationofthisResearch . 136 7.3.3 DirectionsforFutureResearch. 137 7.4 Availability of MINIX 3 .........................139 REFERENCES 141 ABBREVIATIONS 161 PUBLICATIONS 163 BIOGRAPHY 165 List of Figures 1.1 FundamentalroleoftheOSinacomputersystem. ... 3 1.2 Growth of the Linux 2.6 kernel and its major subsystems . ..... 7 1.3 Lackoffaultisolationinamonolithicdesign . ...... 9 1.4 Independentprocessesinamultiserverdesign. .... 11 2.1 Multiserver design of the MINIX 3 operatingsystem . 20 2.2 IPC primitives implemented by the MINIX 3 kernel............ 23 2.3 Format of fixed-length IPC messages in MINIX 3 ............. 23 2.4 Parameterssupportedbytheserviceutility . .. 25 2.5 Startingnewdriversisdonebythedrivermanager . .. 25 2.6 Resources that can be configured via isolation policies . ....... 28 2.7 HardwareprotectionprovidedbyMMUandIOMMU . 30 2.8 Failed drivers can be automatically restarted . ... 33 3.1 Classificationofprivilegeddriveroperations . ..... 38 3.2 Asymmetric trust and vulnerabilities in synchronous IPC . ..... 41 3.3 Overview of new kernel calls for device drivers . 44 3.4 Structure of memory grants and grant flags . 47 3.5 Hierarchical structure of memory grants . 48 3.6 IPCpatternstodealwithasymmetrictrust. 53 3.7 Per-driver policy definition is done using simple text files ........ 54 3.8 InteractionsofanisolateddriverandtherestoftheOS . ..... 55 4.1 Classification of defect detection techniques in MINIX 3 ......... 58 4.2 Example of a parameterized, generic recovery script . .. 61 4.3 Procedure for restarting a failed device driver . .. 62 4.4 SummaryofthedatastoreAPIforstatemanagement . .. 63 4.5 Driver I/O properties and recovery support . 66 4.6 Components that have to be aware of driver recovery . .. 67 4.7 Afilterdrivercancheckfordriverprotocolviolations . ..... 71 4.8 On-diskchecksumminglayoutusedbythefilterdriver . ..... 72 4.9 Dedicated recovery script for the network server (INET) . 74 xi xii LIST OF FIGURES 5.1 FaulttypesandcodemutationsusedforSWIFItesting . ..... 77 5.2 DriverconfigurationssubjectedtoSWIFItesting . .... 79 5.3 Networkdriverfailurecountsforeachfaulttype . ... 80 5.4 Network driver failure reasons for each fault type . ... 81 5.5 Unauthorized access attempts found in the system logs . ...... 82 5.6 FaultsneededtodisruptandcrashtheDP8390driver . .... 83 5.7 FaultsneededtodisruptandcrashtheRTL8139driver . .... 83 5.8 SelectedbugsfoundduringSWIFItesting. 84 5.9 ResultsofsevenSWIFItestswiththeATWINIdriver . .. 86 5.10 AudioplaybackwhiletestingtheES1371driver . ..... 88 5.11 System call times for in-kernel versus user-level drivers . ..... 90 5.12 Raw reads for in-kernel versus user-level drivers . 91 5.13 File reads for in-kernel versus user-level drivers . ... 91 5.14 Application-level benchmarks using the filter driver . ....... 92 5.15 Cross-platform comparison of disk read performance . 93 5.16 Network throughput with and without driver recovery . ... 94 5.17 Diskthroughputwithandwithoutdriverrecovery . .... 95 5.18 Evolution of the MINIX 3 kernel, drivers, and servers . 96 5.19 Line counts for the most important MINIX 3 components. 97 5.20 EvolutionoftheLinux2.6kernelanddevicedrivers . ..... 99 5.21 Linux2.6drivergrowthinlinesofexecutablecode . ......100 6.1 Hardware-enforced protection domains in Nooks . 103 6.2 Software-based fault-isolation procedure in BGI . 106 6.3 Full
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages185 Page
-
File Size-