OS Circular on QEMU/KQEMU/KVM/Xen­HVM

http://openlab.jp/oscircular/

Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Nguyen Anh Quynh National Institute of Advanced Industrial Science and Technology (AIST,Japan)

ABSTRACT OS Circular is a framework of Internet Disk Image Distributor for virtual machines. The disk images are based on QEMU­DM(device model) and boot on QEMU, KQEMU, KVM and Xen­HVM. They are distributed by the stackable virtual disk "HTTP­FUSE CLOOP". It re­constructs a virtual disk with split and compressed block files on HTTP Servers. The file name of block file is the SHA1 digest of its contents. A block file is checked when it is re­mapped to virtual disk on a client PC. Block files are cached on a local storage and they make possible disconnect operation. Trusted HTTP­FUSE CLOOP is stackable and easy to add news block files for security update. These features enable us to distribute OS images for anonymous users via Internet.

1. Introduction We are developing "OS Circular"[1,2] which boots any OSes on anonymous PC. It makes easy to try new OS features. For this purpose we use virtual machine to get a common PC environment on any PCs. OS Circular is based on "HTTP­FUSE Xenoppix" [3]. HTTP­FUSE Xenoppix offers basic function for Internet Thin Client and enables to boot NetBSD or Plan9. However it is based on para­virtualization of "Xen", which limits Guest OSes and restricts automatic security update of kernel. Furthermore the virtual disk "HTTP­FUSE CLOOP" has no security protection. This paper resolves these problems and offers secure and easy­to­use OS Circulation Environment. FLOZ (Free Live OS Zoo) [4] has similar concept of our OS Circular. It enables us to boot many OSes in the Web browser. FLOZ runs a QEMU virtual machine on the server and users interacts with the Guest OS on the QEMU via a Java­based tight VNC client. It is innovative OS testing environment but the performance and scalability is limited, because it is server centric system and suffers network latency on Internet. Besides the OS on FLOZ doesn't allow the network connection because of network resource and security restriction on the server. Virtual disk of QEMU and VMware is popular to distribute ready­to­use OS image. However a virtual disk must be treated as one large file because it includes all software. It takes much time to download. Furthermore, a big virtual disk has to re­build when a bit of data is updated. It is not good for urgent security update. Design principle for OS Circular is to utilize existing infrastructure, because it aims global deployment and minimizes maintenance cost. OS Circular is a client centric system, because current client PC has a powerful CPU and network interface and the function of server is limited to file distribution only. We use HTTP for file distribution because Web hosting service is easy­to­get and mirror severs and cache proxies are utilized. The validness of contents is checked by a client PC, which is offered by Trusted HTTP­FUSE CLOOP. HTTP­FUSE CLOOP is stackable virtual disk and enables partial update of virtual disks for application security. OS Circular uses full virtualization because the Guest OSes get automatic security update service. The kernel of Guest OS should be the target of security update. In this paper we describe the detail of OS Circular. In section2 we mention about the virtual machine loader. In section 3 the requirement of virtual disks is described. The detail of Trusted HTTP­FUSE CLOOP is mention in section 4. Current implementation of OS Circular is reported in section 5. We discuss related works and future plans in section 6 and conclude in section 7. 2. Virtual Machine Loader Full virtualization offers an abstraction layer for Guest OSes. The abstraction layer provides a common virtualized PC environment on anonymous PC. There are no restriction for Guest OS. We can install Guest OS with normal installer, update the OS with package management software, and migrate the disk image to other PCs which have the same full virtualization. For this purpose, VMware is used on SoulPad[5] , VAT (Virtual Appliance Transceiver) of Collective[6] , LivePC of Moka5[7] and Internet Suspend/Resume[8]. The drivers for real devices are supported by the host OS. SoulPad and VAT of Collective use as the host OS, because KNOPPIX has "AutoConfig" function that automatically detects available devices at boot time and loads the appropriate Linux drivers. The combination of AutoConfig and full virtualization on the host OS acts as a virtual machine loader. So we use VMKnoppix[9] for OS Circular.

2.1 VMKnoppix VMKnoppix[9] is a collection of Virtual Machine software on the 1CD Linux "KNOPPIX". KNOPPIX prepares device drivers on any client PC and Full virtualization is offered by Xen­HVM[10], QEMU[11], KEQMU(QEMU Accelerator)[12], and KVM[13]. Xen­HVM and KVM require virtualization instructions (Intel VT or AMD­V) which offer a mode to trap sensitive instructions. VMKnoppix works as a virtual machine loader.

2.2 Unified Device Model Xen­HVM, QEMU, KEQMU, and KVM assumes the same devices which is called QEMU­DM(Device model). They are RealTek RTL8029 for NIC, Cirrus Logic GD5446 for Video Card and etc. The guest OS only have to support these devices. The unified device model enables to boot the disk images on the virtual machines. However the CPU architecture is different among Xen­HVM, QEMU, KEQMU, and KVM. Xen­HVM uses the same architecture of the running real CPU. Fortunately the difference doesn't cause big problem because the current operating system uses common i386 packages. The packages run on Pentium Pro or after IA32 CPU.

3. Requirements for Virtual Disks This section examines other important features that virtual disks should offer, such as versioning, globalization, and security. The main requirements of virtual disks are described in [14]. According to the requirements we shape the design for Trusted HTTP­FUSE CLOOP presented in the next section.

3.1 Versioning Versioning of virtual disks are desirable feature on virtual machine. Non­persistent versioning is used for "undo" of operations and persistent versioning is used for "rollback" of OS image. The versioning is also important for sharing and customizing virtual disks. Some virtual machine softwares have "undo" function. For example VMware has Non­persistent mode and QEMU and UserModeLinux have CopyOnWrite. Xen uses DeviceMapper of Linux for non­persistent versioning. So Virtual Disks should offer persistent versioning. Trusted HTTP­FUSE CLOOP has same versioning of Venti[15]. Venti is Plan9's archival storage system that permanently stores data blocks. A SHA1 digest of the data (called score by Venti) acts as the address of the data. This enforces a write­once policy since no other data block can be found with the same address. The addresses of multiple writes of the same data are identical, so duplicate data is easily identified and the data block is stored only once. Data blocks cannot be removed, making it ideal for permanent or backup storage. Unfortunately Venti requires special protocol for and limits the scalability. Trusted HTTP­FUSE CLOOP saves data blocks to files and utilizes HTTP to distribute them. 3.2 Globalization Virtual disks are desired to be shared via Internet for OS circulation. It doesn't mean that virtual disks are downloadable. Virtual disks should be accessed partially when the part is requested. Remote Block Device meets the requirement and there are many choices, for example, iSCSI, AOE(ATA over Ethernet), iFCP, etc. Unfortunately they use special protocols and require special daemons on the server. Most of them are designed for LAN and aren't good for Internet. Trusted HTTP­FUSE CLOOP re­constructs a virtual disk with the partial block files which are downloadable via HTTP. Disconnect operation is also desired for virtual disks to achieve Mobile Computing. As file systems "AFS" and "Coda" deal with disconnect operation but they require special protocol and daemons for servers. Stateless Linux[16] offers disconnect operation for Thin Client PC. Stateless Linux runs with network storage or snapshot image saved local storage. Block files of Trusted HTTP­FUSE CLOOP are also saved to a local storage and re­ usable. When whole block files are saved in a local storage, network connection is not required.

3.3 Security Basically security management is independent of virtual disk. Security of kernel and applications should be managed by security software or package manager. The commitment of virtual disk is to keep uncontaminated contents. Especially the security of network virtual disk is to prevent intrusion. One way to prevent intrusion is to establish secure communication. However secure communication has to assign a fixed server and limits scalability. Trusted HTTP­FUSE CLOOP adapts a validation mechanism on a client. The driver checks the validity of block contents when it mapped to the virtual disk. It allows distributing block data by anonymous servers.

4. Stackable Virtual Disk "Trusted HTTP­FUSE CLOOP" The idea of Trusted HTTP­FUSE CLOOP depends on CLOOP of KNOPPIX. CLOOP is convenient because it saves block device to a file and reduces the size. However a CLOOP file is still big. The size of traditional CD­ KNOPPIX is about 700MB. It must be treated as one file and takes much time to download. When a bit of data is updated, a big CLOOP file has to re­build. Furthermore CLOOP has no security protection. To solve the problems, we adapt block management style of Venti[15]. Data of a block device is divided by a fixed block size and saved for many small block files. Saved data are also compressed. Block files are treated as network transparent between local and remote. Local storage acts as a cache. The downloaded block files are measured with its contents SHA1 digest. It keeps security for Internet block device. The feature of the network loopback device is followings. • A block file is made of split block device(Default split size is 256KB). A block data is compressed by zlib and saved to a block file. ♦ The original CLOOP's split size "64KB" is too small and makes too many files. • Block files are mapped to a loopback device with the mapping table file. • The mapping of block file is done when a relevant read request is issued. After mapping, the block file is erasable from local storage, because it is re­downloaded from Internet. • A name of block file is the hash value of SHA1. If the block contents are same, they are held together a same name file and reduce total volume. The block contents become identifiable because it is confirmed by the SHA1 file name. • Block files are downloadable from HTTP server because HTTP is expected to be strong file delivery infrastructure. For examples, mirror servers and proxy servers. • When mapping a block file to the loopback device, the contents are measured with SHA1 file name which is listed in the mapping table file. • Partial Updatable. When an application is updated original block device, relevant block files are renewed with new SHA1 file name. The mapping table file is also renewed. The rest block files are reusable.

"FUSE" (File system in USEr­space)[17] is used to implement the virtual loopback device. Figure 1. Creation of block files from OS image

Figure 1 shows the creation of block files and a mapping table file "map01.idx". They are made from a block device which includes root file system. Each block file has a name of SHA1. "map01.idx" has a mapping table of the block files.

Figure 2. Diagram of Trusted HTTP­FUSE CLOOP

Figure 2 shows the diagram of Trusted HTTP­FUSE CLOOP. The loopback device is mapped as a normal block device on a virtual machine. The main program of HTTP­FUSE CLOOP is implemented as a part of FUSE wrapper program. A mapping table file has to be obtained in security. The mapping table file is used to setup Trusted HTTP­FUSE CLOOP. When a read request is issued, Trusted HTTP­FUSE CLOOP driver searches a relevant block file with the mapping table. If a relevant file exists on a local storage, the file is used. If not, the file is downloaded from Internet. The block file is downloaded by HTTP server with "libcurl". Each downloaded file is decompressed by "libz" and measured with the SHA1 digest by "libcrypto". The measurement is logged (/var/log/fs_wrapper_PID.log). Even if a block file is fabricated, the block is detected. Figure 3 shows the case to detect falsified block file. 1150452051.109: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :missed. 1150452051.112: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :hits. 1150452051.112: #00000001(166cbaedbb1cc836e7c95d7d9943efde5a53829e) :missed. 1150452051.113: #00000002(29c4e363dbad648072751ca1f856e5780dd2981d) :missed. 1150452051.114: #00000003(fa8ad05b713a9cf8a701636ca6c353dc58fd6bfd) :missed. 1150452051.114: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :missed. 1150452051.128: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :hits. 1150452051.128: #00000005(916f62a6e2caedc1279a0a74975a406ddb60ec25) :missed. 1150452051.129: #00000006(19111dfc877a4fe241e125d10176d85a99b4bb86) :missed. 1150452051.130: #00000007(950c1d7623b374f8e03309a93041f5adfa3af80f) :missed. 1150452051.130: #00000008(486472b0ee27157d755bd59d623179cfc0034747) :missed.

1150452375.989: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :missed. 1150452375.993: #00000000(845b31ded38e15c1fa8febf97fe0781f23af98c3) :hits. 1150452375.993: #00000001(166cbaedbb1cc836e7c95d7d9943efde5a53829e) :missed. 1150452375.994: #00000002(29c4e363dbad648072751ca1f856e5780dd2981d) :missed. 1150452375.995: #00000003(fa8ad05b713a9cf8a701636ca6c353dc58fd6bfd) :missed. 1150452375.996: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :missed. 1150452375.997: #00000004(1f82a543fa9310c44eff6a13618beca3cacffc12) :hits. 1150452375.997: #00000005(916f62a6e2caedc1279a0a74975a406ddb60ec25) :missed. 1150452375.998: #00000006(19111dfc877a4fe241e125d10176d85a99b4bb86) :missed. E: can't validiate block. Figure 3. Log of HTTP­FUSE CLOOP (/var/log//fs_wrapper_PID.log). The case to correctly download block files (Upper) and the case to detect a falsified block file (Lower). "missed" indicates to download a block file. "hits" indicated to find a block file at local storage.

Figure 4. Block Mapping of Trusted HTTP­FUSE CLOOP

The downloaded block file is stored at local storage (/tmp/blocks/). If the storage space is not enough (more than 80% is used), the previous downloaded files are removed by LIFO of water mark algorithm. Figure 4 indicate a Trusted HTTP­FUSE CLOOP from a view of block mapping.

4.1 Update by difference blocks The addressing of HTTP­FUSE CLOOP is managed by the mapping table file. So the update of HTTP­FUSE CLOOP is done by adding updated block files and renewing mapping table file. The rest block files are reusable. To achieve this function, the file system on HTTP­FUSE CLOOP has to treat block unit update as file system. "iso9660" is not suitable because partial update of iso9660 changes the location of following blocks. The updated block is saved to a file with new file name of SHA1. Collision of file name will be rarely happened. Even if a collision happens, we can check and fix before uploading the block files. Figure 5 shows an example of update of HTTP­FUSE CLOOP. It is useful to update applications of KNOPPIX, especially for security update. Furthermore we can rollback to an old file system if the old mapping table "map02.idx" and block files exist.

Figure 5. Partial update of HTTP­FUSE CLOOP

4.2 Optimization Trusted HTTP­FUSE CLOOP is vulnerable to network latency and causes narrow bandwidth. Especially boot time has no cue to hide latency. To solve this problem, we add two functions, "netselect" and "DLAHEAD(download ahead) ". Netselect enables us to find the best site and DLAHEAD enables us to download the necessary block files in advance. "netselect" is a software to search the shortest latency site among candidates using "ping". We prepare several HTTP sites for OS Circular and add a boot option of "netselect" to find nearest site automatically. We arranged the HTTP sites to be dispersed across the global as possible. DLAHEAD downloads the necessary block files in advance. The list of necessary block files is made from the boot profile. DLAHEAD establishes multiple connections and downloads block files in parallel. The default number of connections is 4. The downloaded block files are saved to a local storage and work as cache.

5. OS Circular The images of Guest OSes are distributed by Trusted HTTP­FUSE CLOOP. VMKnoppix boots the OS on a client PC. As current implementation Debian GNU/Linux and FreeBSD are bootable on QEMU/KQEMU/KVM/Xen­ HVM of VMKnoppix. 5.1 Security Update Debian GNU/Linux is supported by the strong community and offers us network update scheme[18]. APT (Advanced Package Tool) is a network package management in Debian. APT uses a list of repositories with available packages. If there is a newer package version in the repository's package list, APT downloads the package and hands the process over to "dpkg" (a medium­level package manager for Debian). The package manager checks integrity of the package, updates the software and keeps the security of total contents. Figure 6 shows the image of security update on OS circular. When the master OS image is updated by "apt­ get" command, new mapping file and block files are created. These files are uploaded on HTTP Servers. Client PC gets a list of mapping table files and selects one. The least mapping table file is the least updated OS. When we want to use old OS image, the old mapping table file is selected.

Figure 6. Security update of Trusted HTTP­FUSE CLOOP on OS Circular

We checked the upgrade of Debian packages from 07/Dec/2006 to 11/Dec/2006. The total packages were 854 and a 4GB virtual disk were translated to 14523 block files(1.9GB) on 07/Dec/2006. The 104 packages were updated and 3420 block files (335MB) are created on 11/Dec/2006. The new block files and mapping table file were added to HTTP servers and the clients could use new OS image by rebooting with new mapping table file.

Figure 7. Download Sites for OS Circular 5.2 Download sites for OS Circular Figure 7 shows the location of the sites. We arranged the sites to be dispersed across the globe as possible as we could. These sites prevent intercontinental connection and make reasonable latency for downloading block files in US, EU and North Asia. Most of them are commercial Web hosting services which are reasonable price and well maintained.

5.3 Performance We measured the performance on OS Circular on IBM/Renovo Think PAD T60 with Core Solo T1300 1.67Ghz. The HTTP server for block files was put at an Internet provider. The network latency was about 10msec. The boot time of Debian on OS Circular was measured till the gdm (GNOME Display Manager) was appeared. The boot time with DLAHEAD was 2 minutes and boot time without DLAHEAD was 4 minutes 10 seconds.

Figure 8. Throughput of Debian­boot on OS Circular. Figure 9. Amount of download block files

Figure 8 shows the throughput of Debian­boot on OS Circular. The result shows the DLAHEAD is effective and makes quick boot. The necessary block files are downloaded before they are required. The average bandwidth was 2.6 times wider than no DLAHEAD. Figure 9 shows the amount of download block files. The amount with DLAHEAD was 87MB. The amount without DLAEAD was 80MB. The 7MB difference was caused by the redundant download. Usually DLAHEAD downloads block files before requested but sometimes can't catch up. At that time same block files are downloaded 2 times.

6. Discussions 6.1 Live update Some researches propose live update of OS on virtual machine, for examples Intel vPro and LUCOS[19]. Virtual machine and Guest OS communicate each other and allows live update for security. Current OS Circular has no function for live update but we have a plan to integrate IDS and IPS.

6.2 Trusted Boot The Trusted Computing Group (TCG)[20] has released specifications about the Trusted Platform Module (TPM) chip that has small amount of volatile and non­volatile storage and cryptographic execution engines. A TPM chip is equipped in a current note PC and it is used for trusted boot. The chain of trust is kept with the control sequence of TPM­enabled BIOS, "Trusted GRUB" [21] and trusted Linux kernel. We have a plan to integrate the trusted boot to OS Circular and prepare remote attestation server to check the integrity of the boot. User can use the framework to check individual boot.

7. Conclusions We proposed a framework of Internet Disk Image Distributor "OS Circular", which is a client centric OS migration system. OS Circular utilizes easy­to­get infrastructure; full virtualization, Web hosting and security update service. The disk images are accessible via Trusted HTTP­FUSE CLOOP and booted on full virtualization environment Xen­HVM, QEMU, KEQMU, and KVM of VMKnoppix on anonymous PC. The disk image is maintained by security management and users can try OS without installation. The validness of contents is checked on a client PC and the disk image can be distributed by anonymous servers. The parts of disk image are cached on a local storage and they make disconnect operation. The current target OSes are Debian GNU/Linux and FeeBSD. We will provide them for anonymous users and improve the usability and performance. We will integrate OS Circular and Trusted Boot which is based on TPM chip. The integrity of OS will be checked by this scheme.

REFERENCES [1] Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Nguyen Anh Quynh , "OS­Circular": A Framework of Internet Client with Xen, XenSummit 2007 Spring, http://www.xensource.com/files/xensummit_4/XenSummit07Spring­ Suzaki.pdf [2] Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Junichi Tsukamoto, Megumi Nakamura, Seiji Munetoh , OS Circulation Environment "Trusted HTTP­FUSE Xenoppix", Virtualization MiniConf at Linux Conference Australia 2007 http://mirror.linux.org.au/linux.conf.au/2007/video/monday/monday_1450_Virtualisation.pdf [3] Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Kenji Kitagawa, Shuichi Tashiro, "HTTP­FUSE Xenoppix", Linux Symposium 2006. [4] Free Live OS Zoo, http://www.oszoo.org/wiki/index.php/Free_Live_OS_Zoo [5] Ramón Cáceres, Casey Carter, Chandra Narayanaswami, and Mandayam Raghunath, Reincarnating PCs with Portable SoulPads, Mobisys, MobiSys'05 [6] Ramesh Chandra, Nickolai Zeldovich, Constantine Sapuntzakis, Monica S. Lam, The Collective: A Cache Based System Management Architecture, NSDI'05. [7] Moka5 LivePC: http://www.moka5.com [8] Michael Kozuch, Mahadev Satyanarayanan, Internet Suspend/Resue, IEEE WMCSA'02. [9] VMKnoppix: http://unit.aist.go.jp/itri/knoppix/vmknoppix/index­en.html [10] Xen: http://www.xensource.com/ [11] QEMU: http://fabrice.bellard.free.fr/qemu/ [12] KQEMU (QEMU Accelerator): http://fabrice.bellard.free.fr/qemu/kqemu­doc.html [13] KVM http://kvm.qumranet.com/ [14] Ban Pfaff, Tal Garfinkel, Mendel Rosenblum, Virtualization Aware File Systems: Getting Beyond the Limitations of Virtual Disks,NSDI'06. [15] S. Quinlan, and D. Dorward: Venti: a new approach to archival storage, the USENIX Conference on File and Storage Technologies, Monterey, CA, pp. 89–102 (2002). [16] Stateless Linux, http://fedoraproject.org/wiki/StatelessLinux [17] FUSE (File system in USEr­space), http://fuse.sourceforge.net/ [18] Martin F. Krafft, The Debian System: Concepts and Techniques, Open Source Press GmbH (2005) [19] Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang, Pen­Chung Yew, Live Updating Operating Systems Using Virtualization, 2nd international conference on Virtual execution environments 2006. [20] Trusted Computing Group, https://www.trustedcomputinggroup.org [21] Trusted GRUB, http://trousers.sourceforge.net/grub.html