OS Circular on QEMU/KQEMU/KVM/XenHVM
http://openlab.jp/oscircular/
Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Nguyen Anh Quynh National Institute of Advanced Industrial Science and Technology (AIST,Japan)
ABSTRACT OS Circular is a framework of Internet Disk Image Distributor for virtual machines. The disk images are based on QEMUDM(device model) and boot on QEMU, KQEMU, KVM and XenHVM. They are distributed by the stackable virtual disk "HTTPFUSE CLOOP". It reconstructs a virtual disk with split and compressed block files on HTTP Servers. The file name of block file is the SHA1 digest of its contents. A block file is checked when it is remapped to virtual disk on a client PC. Block files are cached on a local storage and they make possible disconnect operation. Trusted HTTPFUSE CLOOP is stackable and easy to add news block files for security update. These features enable us to distribute OS images for anonymous users via Internet.
1. Introduction We are developing "OS Circular"[1,2] which boots any OSes on anonymous PC. It makes easy to try new OS features. For this purpose we use virtual machine to get a common PC environment on any PCs. OS Circular is based on "HTTPFUSE Xenoppix" [3]. HTTPFUSE Xenoppix offers basic function for Internet Thin Client and enables to boot NetBSD or Plan9. However it is based on paravirtualization of "Xen", which limits Guest OSes and restricts automatic security update of kernel. Furthermore the virtual disk "HTTPFUSE CLOOP" has no security protection. This paper resolves these problems and offers secure and easytouse OS Circulation Environment. FLOZ (Free Live OS Zoo) [4] has similar concept of our OS Circular. It enables us to boot many OSes in the Web browser. FLOZ runs a QEMU virtual machine on the server and users interacts with the Guest OS on the QEMU via a Javabased tight VNC client. It is innovative OS testing environment but the performance and scalability is limited, because it is server centric system and suffers network latency on Internet. Besides the OS on FLOZ doesn't allow the network connection because of network resource and security restriction on the server. Virtual disk of QEMU and VMware is popular to distribute readytouse OS image. However a virtual disk must be treated as one large file because it includes all software. It takes much time to download. Furthermore, a big virtual disk has to rebuild when a bit of data is updated. It is not good for urgent security update. Design principle for OS Circular is to utilize existing infrastructure, because it aims global deployment and minimizes maintenance cost. OS Circular is a client centric system, because current client PC has a powerful CPU and network interface and the function of server is limited to file distribution only. We use HTTP for file distribution because Web hosting service is easytoget and mirror severs and cache proxies are utilized. The validness of contents is checked by a client PC, which is offered by Trusted HTTPFUSE CLOOP. HTTPFUSE CLOOP is stackable virtual disk and enables partial update of virtual disks for application security. OS Circular uses full virtualization because the Guest OSes get automatic security update service. The kernel of Guest OS should be the target of security update. In this paper we describe the detail of OS Circular. In section2 we mention about the virtual machine loader. In section 3 the requirement of virtual disks is described. The detail of Trusted HTTPFUSE CLOOP is mention in section 4. Current implementation of OS Circular is reported in section 5. We discuss related works and future plans in section 6 and conclude in section 7. 2. Virtual Machine Loader Full virtualization offers an abstraction layer for Guest OSes. The abstraction layer provides a common virtualized PC environment on anonymous PC. There are no restriction for Guest OS. We can install Guest OS with normal installer, update the OS with package management software, and migrate the disk image to other PCs which have the same full virtualization. For this purpose, VMware is used on SoulPad[5] , VAT (Virtual Appliance Transceiver) of Collective[6] , LivePC of Moka5[7] and Internet Suspend/Resume[8]. The drivers for real devices are supported by the host OS. SoulPad and VAT of Collective use KNOPPIX as the host OS, because KNOPPIX has "AutoConfig" function that automatically detects available devices at boot time and loads the appropriate Linux drivers. The combination of AutoConfig and full virtualization on the host OS acts as a virtual machine loader. So we use VMKnoppix[9] for OS Circular.
2.1 VMKnoppix VMKnoppix[9] is a collection of Virtual Machine software on the 1CD Linux "KNOPPIX". KNOPPIX prepares device drivers on any client PC and Full virtualization is offered by XenHVM[10], QEMU[11], KEQMU(QEMU Accelerator)[12], and KVM[13]. XenHVM and KVM require virtualization instructions (Intel VT or AMDV) which offer a mode to trap sensitive instructions. VMKnoppix works as a virtual machine loader.
2.2 Unified Device Model XenHVM, QEMU, KEQMU, and KVM assumes the same devices which is called QEMUDM(Device model). They are RealTek RTL8029 for NIC, Cirrus Logic GD5446 for Video Card and etc. The guest OS only have to support these devices. The unified device model enables to boot the disk images on the virtual machines. However the CPU architecture is different among XenHVM, QEMU, KEQMU, and KVM. XenHVM uses the same architecture of the running real CPU. Fortunately the difference doesn't cause big problem because the current operating system uses common i386 packages. The packages run on Pentium Pro or after IA32 CPU.
3. Requirements for Virtual Disks This section examines other important features that virtual disks should offer, such as versioning, globalization, and security. The main requirements of virtual disks are described in [14]. According to the requirements we shape the design for Trusted HTTPFUSE CLOOP presented in the next section.
3.1 Versioning Versioning of virtual disks are desirable feature on virtual machine. Nonpersistent versioning is used for "undo" of operations and persistent versioning is used for "rollback" of OS image. The versioning is also important for sharing and customizing virtual disks. Some virtual machine softwares have "undo" function. For example VMware has Nonpersistent mode and QEMU and UserModeLinux have CopyOnWrite. Xen uses DeviceMapper of Linux for nonpersistent versioning. So Virtual Disks should offer persistent versioning. Trusted HTTPFUSE CLOOP has same versioning of Venti[15]. Venti is Plan9's archival storage system that permanently stores data blocks. A SHA1 digest of the data (called score by Venti) acts as the address of the data. This enforces a writeonce policy since no other data block can be found with the same address. The addresses of multiple writes of the same data are identical, so duplicate data is easily identified and the data block is stored only once. Data blocks cannot be removed, making it ideal for permanent or backup storage. Unfortunately Venti requires special protocol for file system and limits the scalability. Trusted HTTPFUSE CLOOP saves data blocks to files and utilizes HTTP to distribute them. 3.2 Globalization Virtual disks are desired to be shared via Internet for OS circulation. It doesn't mean that virtual disks are downloadable. Virtual disks should be accessed partially when the part is requested. Remote Block Device meets the requirement and there are many choices, for example, iSCSI, AOE(ATA over Ethernet), iFCP, etc. Unfortunately they use special protocols and require special daemons on the server. Most of them are designed for LAN and aren't good for Internet. Trusted HTTPFUSE CLOOP reconstructs a virtual disk with the partial block files which are downloadable via HTTP. Disconnect operation is also desired for virtual disks to achieve Mobile Computing. As file systems "AFS" and "Coda" deal with disconnect operation but they require special protocol and daemons for servers. Stateless Linux[16] offers disconnect operation for Thin Client PC. Stateless Linux runs with network storage or snapshot image saved local storage. Block files of Trusted HTTPFUSE CLOOP are also saved to a local storage and re usable. When whole block files are saved in a local storage, network connection is not required.
3.3 Security Basically security management is independent of virtual disk. Security of kernel and applications should be managed by security software or package manager. The commitment of virtual disk is to keep uncontaminated contents. Especially the security of network virtual disk is to prevent intrusion. One way to prevent intrusion is to establish secure communication. However secure communication has to assign a fixed server and limits scalability. Trusted HTTPFUSE CLOOP adapts a validation mechanism on a client. The driver checks the validity of block contents when it mapped to the virtual disk. It allows distributing block data by anonymous servers.
4. Stackable Virtual Disk "Trusted HTTPFUSE CLOOP" The idea of Trusted HTTPFUSE CLOOP depends on CLOOP of KNOPPIX. CLOOP is convenient because it saves block device to a file and reduces the size. However a CLOOP file is still big. The size of traditional CD KNOPPIX is about 700MB. It must be treated as one file and takes much time to download. When a bit of data is updated, a big CLOOP file has to rebuild. Furthermore CLOOP has no security protection. To solve the problems, we adapt block management style of Venti[15]. Data of a block device is divided by a fixed block size and saved for many small block files. Saved data are also compressed. Block files are treated as network transparent between local and remote. Local storage acts as a cache. The downloaded block files are measured with its contents SHA1 digest. It keeps security for Internet block device. The feature of the network loopback device is followings. • A block file is made of split block device(Default split size is 256KB). A block data is compressed by zlib and saved to a block file. ♦ The original CLOOP's split size "64KB" is too small and makes too many files. • Block files are mapped to a loopback device with the mapping table file. • The mapping of block file is done when a relevant read request is issued. After mapping, the block file is erasable from local storage, because it is redownloaded from Internet. • A name of block file is the hash value of SHA1. If the block contents are same, they are held together a same name file and reduce total volume. The block contents become identifiable because it is confirmed by the SHA1 file name. • Block files are downloadable from HTTP server because HTTP is expected to be strong file delivery infrastructure. For examples, mirror servers and proxy servers. • When mapping a block file to the loopback device, the contents are measured with SHA1 file name which is listed in the mapping table file. • Partial Updatable. When an application is updated original block device, relevant block files are renewed with new SHA1 file name. The mapping table file is also renewed. The rest block files are reusable.
"FUSE" (File system in USErspace)[17] is used to implement the virtual loopback device. Figure 1. Creation of block files from OS image
Figure 1 shows the creation of block files and a mapping table file "map01.idx". They are made from a block device which includes root file system. Each block file has a name of SHA1. "map01.idx" has a mapping table of the block files.
Figure 2. Diagram of Trusted HTTPFUSE CLOOP
Figure 2 shows the diagram of Trusted HTTPFUSE CLOOP. The loopback device is mapped as a normal block device on a virtual machine. The main program of HTTPFUSE CLOOP is implemented as a part of FUSE wrapper program. A mapping table file has to be obtained in security. The mapping table file is used to setup Trusted HTTPFUSE CLOOP. When a read request is issued, Trusted HTTPFUSE CLOOP driver searches a relevant block file with the mapping table. If a relevant file exists on a local storage, the file is used. If not, the file is downloaded from Internet. The block file is downloaded by HTTP server with "libcurl". Each downloaded file is decompressed by "libz" and measured with the SHA1 digest by "libcrypto". The measurement is logged (/var/log/fs_wrapper_PID.log). Even if a block file is fabricated, the block is detected. Figure 3 shows the case to detect falsified block file. 1150452051.109: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :missed. 1150452051.112: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :hits. 1150452051.112: #00000001(166cbaedbb1cc836e7c95d7d9943efde5a53829e) :missed. 1150452051.113: #00000002(29c4e363dbad648072751ca1f856e5780dd2981d) :missed. 1150452051.114: #00000003(fa8ad05b713a9cf8a701636ca6c353dc58fd6bfd) :missed. 1150452051.114: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :missed. 1150452051.128: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :hits. 1150452051.128: #00000005(916f62a6e2caedc1279a0a74975a406ddb60ec25) :missed. 1150452051.129: #00000006(19111dfc877a4fe241e125d10176d85a99b4bb86) :missed. 1150452051.130: #00000007(950c1d7623b374f8e03309a93041f5adfa3af80f) :missed. 1150452051.130: #00000008(486472b0ee27157d755bd59d623179cfc0034747) :missed.
1150452375.989: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :missed. 1150452375.993: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :hits. 1150452375.993: #00000001(166cbaedbb1cc836e7c95d7d9943efde5a53829e) :missed. 1150452375.994: #00000002(29c4e363dbad648072751ca1f856e5780dd2981d) :missed. 1150452375.995: #00000003(fa8ad05b713a9cf8a701636ca6c353dc58fd6bfd) :missed. 1150452375.996: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :missed. 1150452375.997: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :hits. 1150452375.997: #00000005(916f62a6e2caedc1279a0a74975a406ddb60ec25) :missed. 1150452375.998: #00000006(19111dfc877a4fe241e125d10176d85a99b4bb86) :missed. E: can't validiate block. Figure 3. Log of HTTPFUSE CLOOP (/var/log//fs_wrapper_PID.log). The case to correctly download block files (Upper) and the case to detect a falsified block file (Lower). "missed" indicates to download a block file. "hits" indicated to find a block file at local storage.
Figure 4. Block Mapping of Trusted HTTPFUSE CLOOP
The downloaded block file is stored at local storage (/tmp/blocks/). If the storage space is not enough (more than 80% is used), the previous downloaded files are removed by LIFO of water mark algorithm. Figure 4 indicate a Trusted HTTPFUSE CLOOP from a view of block mapping.
4.1 Update by difference blocks The addressing of HTTPFUSE CLOOP is managed by the mapping table file. So the update of HTTPFUSE CLOOP is done by adding updated block files and renewing mapping table file. The rest block files are reusable. To achieve this function, the file system on HTTPFUSE CLOOP has to treat block unit update as EXT2 file system. "iso9660" is not suitable because partial update of iso9660 changes the location of following blocks. The updated block is saved to a file with new file name of SHA1. Collision of file name will be rarely happened. Even if a collision happens, we can check and fix before uploading the block files. Figure 5 shows an example of update of HTTPFUSE CLOOP. It is useful to update applications of KNOPPIX, especially for security update. Furthermore we can rollback to an old file system if the old mapping table "map02.idx" and block files exist.
Figure 5. Partial update of HTTPFUSE CLOOP
4.2 Optimization Trusted HTTPFUSE CLOOP is vulnerable to network latency and causes narrow bandwidth. Especially boot time has no cue to hide latency. To solve this problem, we add two functions, "netselect" and "DLAHEAD(download ahead) ". Netselect enables us to find the best site and DLAHEAD enables us to download the necessary block files in advance. "netselect" is a software to search the shortest latency site among candidates using "ping". We prepare several HTTP sites for OS Circular and add a boot option of "netselect" to find nearest site automatically. We arranged the HTTP sites to be dispersed across the global as possible. DLAHEAD downloads the necessary block files in advance. The list of necessary block files is made from the boot profile. DLAHEAD establishes multiple connections and downloads block files in parallel. The default number of connections is 4. The downloaded block files are saved to a local storage and work as cache.
5. OS Circular The images of Guest OSes are distributed by Trusted HTTPFUSE CLOOP. VMKnoppix boots the OS on a client PC. As current implementation Debian GNU/Linux and FreeBSD are bootable on QEMU/KQEMU/KVM/Xen HVM of VMKnoppix. 5.1 Security Update Debian GNU/Linux is supported by the strong community and offers us network update scheme[18]. APT (Advanced Package Tool) is a network package management in Debian. APT uses a list of repositories with available packages. If there is a newer package version in the repository's package list, APT downloads the package and hands the process over to "dpkg" (a mediumlevel package manager for Debian). The package manager checks integrity of the package, updates the software and keeps the security of total contents. Figure 6 shows the image of security update on OS circular. When the master OS image is updated by "apt get" command, new mapping file and block files are created. These files are uploaded on HTTP Servers. Client PC gets a list of mapping table files and selects one. The least mapping table file is the least updated OS. When we want to use old OS image, the old mapping table file is selected.
Figure 6. Security update of Trusted HTTPFUSE CLOOP on OS Circular
We checked the upgrade of Debian packages from 07/Dec/2006 to 11/Dec/2006. The total packages were 854 and a 4GB virtual disk were translated to 14523 block files(1.9GB) on 07/Dec/2006. The 104 packages were updated and 3420 block files (335MB) are created on 11/Dec/2006. The new block files and mapping table file were added to HTTP servers and the clients could use new OS image by rebooting with new mapping table file.
Figure 7. Download Sites for OS Circular 5.2 Download sites for OS Circular Figure 7 shows the location of the sites. We arranged the sites to be dispersed across the globe as possible as we could. These sites prevent intercontinental connection and make reasonable latency for downloading block files in US, EU and North Asia. Most of them are commercial Web hosting services which are reasonable price and well maintained.
5.3 Performance We measured the performance on OS Circular on IBM/Renovo Think PAD T60 with Core Solo T1300 1.67Ghz. The HTTP server for block files was put at an Internet provider. The network latency was about 10msec. The boot time of Debian on OS Circular was measured till the gdm (GNOME Display Manager) was appeared. The boot time with DLAHEAD was 2 minutes and boot time without DLAHEAD was 4 minutes 10 seconds.
Figure 8. Throughput of Debianboot on OS Circular. Figure 9. Amount of download block files
Figure 8 shows the throughput of Debianboot on OS Circular. The result shows the DLAHEAD is effective and makes quick boot. The necessary block files are downloaded before they are required. The average bandwidth was 2.6 times wider than no DLAHEAD. Figure 9 shows the amount of download block files. The amount with DLAHEAD was 87MB. The amount without DLAEAD was 80MB. The 7MB difference was caused by the redundant download. Usually DLAHEAD downloads block files before requested but sometimes can't catch up. At that time same block files are downloaded 2 times.
6. Discussions 6.1 Live update Some researches propose live update of OS on virtual machine, for examples Intel vPro and LUCOS[19]. Virtual machine and Guest OS communicate each other and allows live update for security. Current OS Circular has no function for live update but we have a plan to integrate IDS and IPS.
6.2 Trusted Boot The Trusted Computing Group (TCG)[20] has released specifications about the Trusted Platform Module (TPM) chip that has small amount of volatile and nonvolatile storage and cryptographic execution engines. A TPM chip is equipped in a current note PC and it is used for trusted boot. The chain of trust is kept with the control sequence of TPMenabled BIOS, "Trusted GRUB" [21] and trusted Linux kernel. We have a plan to integrate the trusted boot to OS Circular and prepare remote attestation server to check the integrity of the boot. User can use the framework to check individual boot.
7. Conclusions We proposed a framework of Internet Disk Image Distributor "OS Circular", which is a client centric OS migration system. OS Circular utilizes easytoget infrastructure; full virtualization, Web hosting and security update service. The disk images are accessible via Trusted HTTPFUSE CLOOP and booted on full virtualization environment XenHVM, QEMU, KEQMU, and KVM of VMKnoppix on anonymous PC. The disk image is maintained by security management and users can try OS without installation. The validness of contents is checked on a client PC and the disk image can be distributed by anonymous servers. The parts of disk image are cached on a local storage and they make disconnect operation. The current target OSes are Debian GNU/Linux and FeeBSD. We will provide them for anonymous users and improve the usability and performance. We will integrate OS Circular and Trusted Boot which is based on TPM chip. The integrity of OS will be checked by this scheme.
REFERENCES [1] Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Nguyen Anh Quynh , "OSCircular": A Framework of Internet Client with Xen, XenSummit 2007 Spring, http://www.xensource.com/files/xensummit_4/XenSummit07Spring Suzaki.pdf [2] Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Junichi Tsukamoto, Megumi Nakamura, Seiji Munetoh , OS Circulation Environment "Trusted HTTPFUSE Xenoppix", Virtualization MiniConf at Linux Conference Australia 2007 http://mirror.linux.org.au/linux.conf.au/2007/video/monday/monday_1450_Virtualisation.pdf [3] Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Kenji Kitagawa, Shuichi Tashiro, "HTTPFUSE Xenoppix", Linux Symposium 2006. [4] Free Live OS Zoo, http://www.oszoo.org/wiki/index.php/Free_Live_OS_Zoo [5] Ramón Cáceres, Casey Carter, Chandra Narayanaswami, and Mandayam Raghunath, Reincarnating PCs with Portable SoulPads, Mobisys, MobiSys'05 [6] Ramesh Chandra, Nickolai Zeldovich, Constantine Sapuntzakis, Monica S. Lam, The Collective: A Cache Based System Management Architecture, NSDI'05. [7] Moka5 LivePC: http://www.moka5.com [8] Michael Kozuch, Mahadev Satyanarayanan, Internet Suspend/Resue, IEEE WMCSA'02. [9] VMKnoppix: http://unit.aist.go.jp/itri/knoppix/vmknoppix/indexen.html [10] Xen: http://www.xensource.com/ [11] QEMU: http://fabrice.bellard.free.fr/qemu/ [12] KQEMU (QEMU Accelerator): http://fabrice.bellard.free.fr/qemu/kqemudoc.html [13] KVM http://kvm.qumranet.com/ [14] Ban Pfaff, Tal Garfinkel, Mendel Rosenblum, Virtualization Aware File Systems: Getting Beyond the Limitations of Virtual Disks,NSDI'06. [15] S. Quinlan, and D. Dorward: Venti: a new approach to archival storage, the USENIX Conference on File and Storage Technologies, Monterey, CA, pp. 89–102 (2002). [16] Stateless Linux, http://fedoraproject.org/wiki/StatelessLinux [17] FUSE (File system in USErspace), http://fuse.sourceforge.net/ [18] Martin F. Krafft, The Debian System: Concepts and Techniques, Open Source Press GmbH (2005) [19] Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang, PenChung Yew, Live Updating Operating Systems Using Virtualization, 2nd international conference on Virtual execution environments 2006. [20] Trusted Computing Group, https://www.trustedcomputinggroup.org [21] Trusted GRUB, http://trousers.sourceforge.net/grub.html