Flexible Operating System Internals: the Design and Implementation of the Anykernel and Rump Kernels Aalto University

Department of Computer Science and Engineering Aalto- Antti Kantee DD Flexible Operating System 171 / 2012 2012 Internals: FlexibleSystemOperating Internals: TheDesign andImplementation theofandAnykernelRump Kernels The Design and Implementation of the Anykernel and Rump Kernels Antti Kantee 9HSTFMG*aejbgi+ ISBN 978-952-60-4916-8 BUSINESS + ISBN 978-952-60-4917-5 (pdf) ECONOMY ISSN-L 1799-4934 ISSN 1799-4934 ART + ISSN 1799-4942 (pdf) DESIGN + ARCHITECTURE UniversityAalto Aalto University School of Science SCIENCE + Department of Computer Science and Engineering TECHNOLOGY www.aalto.fi CROSSOVER DOCTORAL DOCTORAL DISSERTATIONS DISSERTATIONS "BMUP6OJWFSTJUZQVCMJDBUJPOTFSJFT %0$503"-%*44&35"5*0/4 &2#&ŗ*,.#(!ŗ3-.'ŗ (.,(&-Ćŗ "ŗ-#!(ŗ(ŗ '*&'(..#)(ŗ) ŗ."ŗ (3%,(&ŗ(ŗ/'*ŗ ,(&-ŗ "OUUJ,BOUFF "EPDUPSBMEJTTFSUBUJPODPNQMFUFEGPSUIFEFHSFFPG%PDUPSPG 4DJFODFJO5FDIOPMPHZUPCFEFGFOEFE XJUIUIFQFSNJTTJPOPGUIF "BMUP6OJWFSTJUZ4DIPPMPG4DJFODF BUBQVCMJDFYBNJOBUJPOIFMEBU UIFMFDUVSFIBMM5PGUIFTDIPPMPO%FDFNCFSBUOPPO "BMUP6OJWFSTJUZ 4DIPPMPG4DJFODF %FQBSUNFOUPG$PNQVUFS4DJFODFBOE&OHJOFFSJOH 4VQFSWJTJOHQSPGFTTPS 1SPGFTTPS)FJLLJ4BJLLPOFO 1SFMJNJOBSZFYBNJOFST %S.BSTIBMM,JSL.D,VTJDL 64" 1SPGFTTPS3FO[P%BWPMJ 6OJWFSTJUZPG#PMPHOB *UBMZ 0QQPOFOU %S1FUFS5SÍHFS )BTTP1MBUUOFS*OTUJUVUF (FSNBOZ "BMUP6OJWFSTJUZQVCMJDBUJPOTFSJFT %0$503"-%*44&35"5*0/4 "OUUJ,BOUFFQPPLB!JLJGJ 1FSNJTTJPOUPVTF DPQZ BOEPSEJTUSJCVUFUIJTEPDVNFOUXJUIPS XJUIPVUGFFJTIFSFCZHSBOUFE QSPWJEFEUIBUUIFBCPWFDPQZSJHIU OPUJDFBOEUIJTQFSNJTTJPOOPUJDFBQQFBSJOBMMDPQJFT%JTUSJCVUJOH NPEJGJFEDPQJFTJTQSPIJCJUFE *4#/ QSJOUFE *4#/ QEG *44/- *44/ QSJOUFE *44/ QEG IUUQVSOGJ63/*4#/ 6OJHSBGJB0Z )FMTJOLJ 'JOMBOE "CTUSBDU "BMUP6OJWFSTJUZ 10#PY '*"BMUPXXXBBMUPGJ "VUIPS (..#ŗ (.ŗ /BNFPGUIFEPDUPSBMEJTTFSUBUJPO &2#&ŗ*,.#(!ŗ3-.'ŗ (.,(&-Ćŗ "ŗ-#!(ŗ(ŗ '*&'(..#)(ŗ) ŗ."ŗ(3%,(&ŗ(ŗ/'*ŗ ,(&-ŗ 1VCMJTIFS"))&ŗ) ŗ#(ŗ 6OJU*,.'(.ŗ) ŗ)'*/.,ŗ#(ŗ(ŗ(!#(,#(!ŗ 4FSJFT&.)ŗ(#0,-#.3ŗ*/&#.#)(ŗ-,#-ŗ ŗ ŗûāûĞüúûüŗ 'JFMEPGSFTFBSDI) .1,ŗ3-.'-ŗ .BOVTDSJQUTVCNJUUFEüûŗ/!/-.ŗüúûüŗ %BUFPGUIFEFGFODFāŗ',ŗüúûüŗ 1FSNJTTJPOUPQVCMJTIHSBOUFE EBUF ăŗ.),ŗüúûüŗ -BOHVBHF(!&#-"ŗ .POPHSBQI "SUJDMFEJTTFSUBUJPO TVNNBSZ PSJHJOBMBSUJDMFT "CTUSBDU "ŗ')()&#."#ŗ%,(&ŗ,"#../,ŗ#-ŗ-#!(#ð(.ŗ#(ŗ."ŗ,&ŗ1),&ŗ/ŗ.)ŗ."ŗ&,!ŗ')/(.ŗ ) ŗ1),%#(!ŗ(ŗ*,)0(ŗ)Ąŗ)10,ąŗ."ŗ,"#../,ŗ#-ŗ().ŗ1#.")/.ŗ*,)&'-Ćŗ.-.#(!ŗ(ŗ 0&)*'(.ŗ#-ŗ# ð/&.ąŗ0#,./&#4#(!ŗ%,(&ŗ-,0#-ŗ(ŗŗ)(ŗ)(&3ŗ3ŗ/*&#.#(!ŗ."ŗ(.#,ŗ %,(&ąŗ(ŗ-/,#.3ŗ#-ŗ1%ŗ/ŗ.)ŗŗ-#(!&ŗ)'#(ŗ1",ŗ&&ŗ)ŗ"-ŗ#,.ŗ--ŗ.)ŗ 0,3."#(!Ąŗ&.,(.ŗ%,(&ŗ,"#../,-ŗ-/"ŗ-ŗ."ŗ'#,)%,(&ŗ(ŗ2)%,(&ŗ"0ŗ(ŗ *,)*)-ŗ.)ŗ,.# 3ŗ."-ŗ*,)&'-ŗ1#."ŗ')()&#."#ŗ%,(&-Ąŗ)10,ąŗ&.,(.ŗ-3-.'ŗ -.,/./,-ŗ)ŗ().ŗ,--ŗ."ŗ)'')(ŗ-#,ŗ) ŗ/-#(!ŗŗ')()&#."#ŗ%,(&ŗ1"(ŗ."ŗ )0'(.#)(ŗ*,)&'-ŗ)ŗ().ŗ**&3Ąŗ ŗ ŗ*,)*)-ŗŗó2#&ŗ(3%,(&ŗ,"#../,ŗ1"#"ŗ(&-ŗ,/((#(!ŗ%,(&ŗ,#0,-ŗ#(ŗŗ0,#.3ŗ ) ŗ)(ð!/,.#)(-ąŗ2'*&-ŗ) ŗ1"#"ŗ#(&/ŗ'#,)%,(&Ě-.3&ŗ-,0,-ŗ(ŗ**&#.#)(ŗ &#,,#-Ąŗŗ')()&#."#ŗ%,(&ŗ#-ŗ-")1(ŗ.)ŗŗ)(0,.#&ŗ#(.)ŗ(ŗ(3%,(&ŗ1#."ŗŗ,-)(&ŗ ')/(.ŗ) ŗ ),.ŗ(ŗ1#.")/.ŗ#(.,)/#(!ŗ*, ),'(Ě"#(,#(!ŗ#(#,.#)(ŗ&3,-Ąŗ"ŗ ),#!#(&ŗ')()&#."#ŗ')ŗ) ŗ)*,.#)(ŗ#-ŗ*,-,0ŗ .,ŗ."ŗ(3%,(&ŗ$/-.'(.-ąŗ(ŗ &.,(.ŗ')-ŗ) ŗ)*,.#)(ŗ ),ŗ,#0,-ŗ,ŗ0#&&ŗ-ŗŗ,/(.#'ŗ")#Ąŗ),ŗ()(Ě')()&#."#ŗ ')-ŗ) ŗ)*,.#)(ąŗ."ŗ,/'*ŗ%,(&ŗ#-ŗ#(.,)/ŗ-ŗŗ&#!".1#!".ŗ)(.#(,ŗ ),ŗ,#0,-Ąŗŗ ,/'*ŗ%,(&ŗ,/(-ŗ)(ŗ.)*ŗ) ŗŗ"3*,0#-),ŗ1"#"ŗ) ,-ŗ"#!"Ě&0&ŗ*,#'#.#0-ŗ-/"ŗ-ŗ.",ŗ -"/&#(!ŗ(ŗ0#,./&ŗ''),3Ąŗŗ*,)/.#)(ŗ+/&#.3ŗ#'*&'(..#)(ŗ ),ŗ."ŗ.ŗ)*(ŗ -)/,ŗŗ"-ŗ(ŗ)(Ąŗ"ŗ(3%,(&ŗ,"#../,ŗ(ŗ,/'*ŗ%,(&-ŗ,ŗ0&/.ŗ)."ŗ !#(-.ŗ )/,ŗ3,-ŗ) ŗ,&Ě1),&ŗ2*,#(ŗ ,)'ŗ#&3ŗ.ŗ0&)*'(.ŗ-ŗ1&&ŗ-ŗ!#(-.ŗ -3(.".#ŗ("',%-Ąŗ ,FZXPSET)*,.#(!ŗ-3-.'ŗ%,(&ŗ,"#../,ąŗ#'*&'(..#)(ąŗ)*(ŗ-)/,ŗ *4#/ QSJOUFE ăāĂĚăÿüĚĀúĚþăûĀĚĂŗ *4#/ QEG ăāĂĚăÿüĚĀúĚþăûāĚÿŗ *44/-ûāăăĚþăýþŗ *44/ QSJOUFE ûāăăĚþăýþŗ *44/ QEG ûāăăĚþăþüŗ -PDBUJPOPGQVCMJTIFS-*))ŗ -PDBUJPOPGQSJOUJOH&-#(%#ŗ :FBSüúûüŗ 1BHFTýÿĂŗ VSO"..*ĆĞĞ/,(ĄðĞĆ ĆăāĂĚăÿüĚĀúĚþăûāĚÿŗ 5JJWJTUFMN» "BMUPZMJPQJTUP 1- "BMUPXXXBBMUPGJ 5FLJK» (..#ŗ (.ŗ 7»JUÍTLJSKBOOJNJ )/-.0.ŗ%<3.. $<,$-.&'<.Ćŗ )%3.#'(ŗ$ŗ3(%<3.#'#(ŗ-//((#..&/ŗ$ŗ.)././-ŗ +VMLBJTJKB,/-.#.#(ŗ%),%%)/&/ŗ :LTJLLÍ#.).%(##%(ŗ&#.)-ŗ 4BSKB&.)ŗ(#0,-#.3ŗ*/&#.#)(ŗ-,#-ŗ ŗ ŗûāûĞüúûüŗ 5VULJNVTBMB"$&'#-.)$<,$-.&'<.ŗ ,»TJLJSKPJUVLTFOQWNüûĄúĂĄüúûüŗ 7»JUÍTQ»JW»úāĄûüĄüúûüŗ +VMLBJTVMVWBONZÍOU»NJTQ»JW»úăĄûúĄüúûüŗ ,JFMJ(!&(.#ŗ .POPHSBGJB :IEJTUFMN»W»JUÍTLJSKB ZIUFFOWFUPPTB FSJMMJTBSUJLLFMJU 5JJWJTUFMN» )()&##..#-(ŗ3.#'(ŗ%<3.. $<,$-.&'<.ŗ)0.ŗ',%#..<0#<ąŗ%)-%ŗ(ŗ-#-<&.<0<.ŗ-//,(ŗ'<<,<(ŗ )&'--)&0#ŗ$ŗ.-...$ŗ$/,#.Ąŗ )()&##..#((ŗ,%%#.".//,#ŗ#ŗ%/#.(%(ŗ)&ŗ)(!&'.)(Ćŗ 0#,./&#-)#(.#ŗ)((#-.//ŗ0#(ŗ0#,./&#-)#'&&ŗ3#(ŗ%)%)(#-//.(ąŗ3.#'(ŗ.-./-ŗ)(ŗ0#%ŗ $ŗ.#.)./,0ŗ)(ŗ"#%%)ŗ$)"./(ŗ-##.<ąŗ..<ŗ3.#'--<ŗ-/),#...0&&ŗ)"$&'#-.)&&ŗ)(ŗ-/),ŗ*<<-3ŗ %<3.. $<,$-.&'<(ŗ%#%%##(ŗ)-##(Ąŗ#".)".)#-#ŗ3#(,%%#.".//,#.ąŗ%/.(ŗ'#%,)3#(ŗ$ŗ %-)3#(ąŗ)(ŗ").../ŗ%),$'(ŗ')()&##..#-(ŗ,%%#.".//,#(ŗ)(!&'#Ąŗ#".)".)#-#--ŗ '&&#--ŗ)(ŗ%/#.(%#(ŗ)(!&'(ŗ-ąŗ..#0<.ŗ(ŗ./ŗ3&#-.<ŗ"&/ŗ%<3..<<ŗ')()&##..#-.ŗ3#(.<ąŗ %/(ŗ3&&<'#(#./.ŗ)(!&'.ŗ#0<.ŗ)&ŗ,&0(..$Ąŗ ŗ <--<ŗ.3 --<ŗ&/)(ŗ$)/-.0ŗ$)%3#(Ąŗŗ'")&&#-.ŗ$/,#(ŗ-/),#..'#-(ŗ')(#--ŗ %)(ð!/,.#)#--ąŗ-#',%%$<ŗ$)#-.ŗ)0.ŗ'#%,)3#(Ě.33&#-.ŗ*&0&#'.ŗ$ŗ -)0&&/-%#,$-.).Ąŗ )()&##..#-(ŗ3.#'(ŗ'//(.'#-(ŗ$)%3.#'%-#ŗ(<3..<<(ŗ)((#-./0(ŗ -<<3&&#-&&<ŗ0#0&&ŗ#&'(ŗ-/),#./-%3%3<ŗ"#%(.<0#<ŗ-.,%.#)#.Ąŗ ")&&#-//-ŗ-/),#..ŗ $/,#.ŗ')()&##..#---<ŗ3.#'--<ŗ-<#&33ŗ$ŗ'//.ŗ-/),#./-%)(ð!/,.#).ŗ.,$).(ŗ$)(#%#-(ŗ 0&#(.(Ąŗ /#.ŗ%/#(ŗ')()&##..#-#-.ŗ-/),#./-%)(ð!/,.#).ŗ0,.(ŗ'<<,#.&&<<(ŗ.3(%<3#(ąŗ $)%ŗ)(ŗ$)3'*<,#-. ŗ$/,#&&Ąŗ3(%<3.#'(ŗ-/),#./-ŗ.*".//ŗ"3*,%#".#'(ŗ *<<&&<Ąŗ3*,%#"#(ŗ.,$)ŗ.3(%<3.#'&&ŗ%),%(ŗ.-)(ŗ*&0&/#.ŗ%/.(ŗ-<#-%/&)#((#(ŗ $ŗ0#,./&#'/#-.#(Ąŗ)././-ŗ)(ŗ.".3ŗ./).(.).-)#-(ŗ.Ě(#'#-&&ŗ0)#'(ŗ &<"%))#(ŗ%<3.. $<,$-.&'<&&Ąŗ )%3#(,%%#.".//,#ŗ$ŗ.3(%<3.#'.ŗ,0#)#(ŗ-%<ŗ(&$<(ŗ 0/)(ŗ,&#%)%'/%-(ŗ..<ŗ-3(...#-.(ŗ%)%#(ŗ*,/-.&&Ąŗ ŗ "WBJOTBOBU%<3.. $<,$-.&'<3#(,%%#.".//,#ąŗ.)././-ąŗ0)#(ŗ&<"%))#ŗ *4#/ QBJOFUUV ăāĂĚăÿüĚĀúĚþăûĀĚĂŗ *4#/ QEG ăāĂĚăÿüĚĀúĚþăûāĚÿŗ *44/-ûāăăĚþăýþŗ *44/ QBJOFUUV ûāăăĚþăýþŗ *44/ QEG ûāăăĚþăþüŗ +VMLBJTVQBJLLB-*))ŗ 1BJOPQBJLLB&-#(%#ŗ 7VPTJüúûüŗ 4JWVN»»S»ýÿĂŗ VSO"..*ĆĞĞ/,(ĄðĞĆ ĆăāĂĚăÿüĚĀúĚþăûāĚÿŗ 7 To the memory of my grandfather, Lauri Kantee 8 9 Preface Meet the new boss Same as the old boss – The Who The document you are reading is an extension of the production quality implementation I used to verify the statements that I put forth in this dissertation. The implementation and this document mutually support each other, and cross-referencing one while studying the other will allow for a more thorough understanding of this work. As usually happens, many people contributed to both, and I will do my best below to account for the ones who made major contributions. In the professor department, my supervisor Heikki Saikkonen has realized that fig- uring things out properly takes a lot of time, so he did not pressure me for status updates when there was nothing new worth reporting. When he challanged me on my writing, his comments were to the point. He also saw the connecting points of my work and put me in touch with several of my later key contacts. At the beginning of my journey, I had discussions with Juha Tuominen, the professor I worked for during my undergraduate years. Our discussions about doctoral work were on a very high level, but I found myself constantly going back to them, and I still remember them now, seven years later. I thank my preliminary examiners, Kirk McKusick and Renzo Davoli, for the extra effort they made to fully understand my work before giving their verdicts. They not only read the manuscript, but also familiarized themselves with the implementation andmanualpages. 10 My friend and I daresay mentor Johannes Helander influenced this work probably more than any other person. The previous statement is somewhat hard to explain, since we almost never talked specifically about my work. Yet, I am certain that the views and philosophies of Johannes run deep throughout my work. Andre Dolenc was the first person to see the potential for using this technology in a product. He also read some of my earlier texts and tried to guide me to a “storytelling” style of writing. I think I may finally have understood his advice. The first person to read a complete draft of the manuscript was Thomas Klausner. Those who know Thomas or his reputation will no doubt understand why I was glad when he offered to proofread and comment. I was rewarded with what seemed like an endless stream of comments, ranging from pointing out typos and grammatical mistakes to asking for clarifications and to highlighting where my discussion was flawed. The vigilance of the NetBSD open source community kept my work honest and of high quality. I could not cut corners in the changes I made, since that would have been called out immediately. At the risk of doing injustice to some by forgetting to name them, I will list a few people from the NetBSD community who helped me throughout the years. Arnaud Ysmal was the first person to use my work for implementing a large-scale application suite. Nicolas Joly contributed numerous patches. I enjoyed the many insightful discussions with Valeriy Ushakov, and I am especially thankful to him for nudging me into the direction of a simple solution for namespace protection. Finally, the many people involved in the NetBSD test effort helped me to evaluate one of the main use cases for this work. Since a doctoral dissertation should not extensively repeat previously published material, on the subject of contributions from my family, please refer to the preface of [Kantee 2004]. 11 Then there is my dear Xi, who, simply put, made sure I survived. When I was hungry, I got food. When I was sleepy, I got coffee, perfectly brewed. Sometimes I got tea, but that usually happened when I had requested it instead of coffee. My output at times was monosyllabic at best (“hack”,“bug”,“fix”, ...), but that did not bother

Flexible Operating System Internals: the Design and Implementation of the Anykernel and Rump Kernels Aalto University

Copy on Write Based File Systems Performance Analysis and Implementation

Uva-DARE (Digital Academic Repository)

Enhancing the Accuracy of Synthetic File System Benchmarks Salam Farhat Nova Southeastern University, [email protected]

High Performance Computing Systems

Communicating Between the Kernel and User-Space in Linux Using Netlink Sockets

Virtfs—A Virtualization Aware File System Pass-Through

DMFS - a Data Migration File System for Netbsd

Internetworking with TCP/IP

Rtkaller: State-Aware Task Generation for RTOS Fuzzing

Review Der Linux Kernel Sourcen Von 4.9 Auf 4.10

Mellanox Linux Switch †

Ubuntu Server Guide Basic Installation Preparing to Install