Copyright © 2019 American Scientific Publishers Journal of All rights reserved Low Power Electronics Printed in the United States of America Vol. 15, 273–281, 2019

Performance Analysis of Based Techniques on Embedded Systems

Deepa Mathew1,BijoyA.Jose1 ∗, and Priyadarsan Patra2

1Department of Electronics, Cochin University of Science and Technology, Kochi 682022, Kerala, India 2Dean of the School of Computer Science and Engineering, Xavier University, Bhubaneswar 751013, India

(Received: 31 March 2019; Accepted: 17 April 2019)

Exploiting the benefits of Virtualization in the world of Embedded technology has opened up new avenues for effective resource utilization, increased scalability, security and cost savings. With the above in perspective, the performance benchmarking of virtualized embedded systems is important. In this paper, we have assessed the performance of various types of virtualization techniques such as and hardware-assisted virtualization in a desktop environment. Microkernel based virtualization techniques are more suitable for environment, due to its low memory footprint and security advantages as only a small amount of trusted code is running at a high privileged level. We have used this implementation to analyze the performance of an OS on a microkernel based virtual environment and compared its performance with an OS in a nonvirtual environment on the same board. In addition to this, we have analyzed the performance of different types of virtualization techniques possible with a microkernel on a low power arm based embedded system with a benchmarking tool. Keywords: Virtualization,IP: 192.168.39.151 Microkernel, Embedded On: Thu, Systems,30 Sep 2021 Virtual 04:06:00 Machines. Copyright: American Scientific Publishers Delivered by Ingenta

1. INTRODUCTION Virtualization allows simultaneous execution of real time The term Virtualization refers to the creation of a virtual OS and a general purpose OS on the same hardware, which version of any device. A would be an improves the performance and security of the systems. independent working environment which provides isola- In today’s world, the embedded system board has pro- tion and the experience of working with physical hard- cessing capability equal to that of desktop computers, ware. Virtualization allows more than one virtual machine which has facilitated enhanced capabilities in its appli- (VM) on the same physical hardware.1 This way the cation areas like robotics, automotive and mobile sys- resources can be shared between different VMs based on tems. The functionalities expected from a mobile phone is demand or pre-decided schemes. In the enterprise sector, increasing and is becoming more complex and it this also gives additional advantages in configuring differ- may need to run multiple OS on the same phone. Most ent Operating Systems (OS) on the same physical machine of the recent ARM boards come equipped with multicore and operating them in parallel. Each OS runs on a vir- architectures, which facilitates extension of the advantages tual machine and each of them is unaware of other virtual of virtualization to the embedded system applications. As machines accessing the system. Alternatively, this facility the technology advances further and boards become more also provides the basis for having multiple sessions for economical, virtualization allows us to replace functional- different users and many other extensions. ities provided by two or three to a sin- A VM is an environment created by the virtualization gle multicore board. Hence, this enables to capitalize on layer, which is also known as a .2 VMs are envi- advantages of cost savings and optimized resource utiliza- sioned to provide isolation and protection of resources. tion. Each functionality resides on each Virtualization began when IBM first introduced virtualiza- virtual machine and these virtual machines are independent tion for servers. From then on, it has evolved over the of each other. Failure of a virtual machine does not affect years and now it finds its application in embedded systems. the other. This finds application in automotive and control industries. Some of the advantages of using virtualization ∗Author to whom correspondence should be addressed. are effective resource utilization, increased scalability, reli- Email: [email protected] ability, cost savings in terms of energy and space.

J. Low Power Electron. 2019, Vol. 15, No. 2 1546-1998/2019/15/273/009 doi:10.1166/jolpe.2019.1602 273 Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems Mathew et al.

There are different approaches by which virtualization is Interprocess communications. All the other required func- possible in embedded systems such as , KVM, QEMU, tionalities are added as user level services and these and different microkernel based OKL4, SEL4, Fiasco.OC. services run in different address space or threads. In a The microkernel based virtualization is apt for the embed- microkernel based , the Microkernel runs ded system due to its low memory footprint. The micro- at a higher privileged mode, known as Kernel mode. kernel selected for the work is Fiasco.OC, which is a third All other functionalities like management, generation microkernel from and Network management, Memory management, and Device is developed at TU Dresden University. Fiasco.OC sup- drivers run at lower privilege level, i.e., as user level ports both paravirtualization and hardware-assisted virtu- threads on top of this Microkernel as shown in Figure 2. alization. , which is paravirtualized , runs So the amount of code running at higher privilege level on top of Fiasco.OC. Benchmarking tools Coremark and is lesser compared to a Monolithic kernel. This helps to LMbench have been used to analyze the performance. build a robust trustworthy base for the system, which in turn improves the overall system security. These features of microkernel match to the requirements of a hypervisor.5 2. BACKGROUND AND RELATED WORK A microkernel based system allows easier management of 2.1. Virtualization code as only the required functionality needs to be added Virtualization allows multiple VMs to run parallelly on to the system as user level service. a single hardware. A software layer called Hypervisor or Virtual Machine Monitor (VMM) provides the abstraction 2.2.1. L4 Family 3 of the underlying hardware to the virtual machine (VM)s. The work of first generation started in the The VMM provides isolation for each VM. The hypervi- 1980s and was not successful because of its poor design sor is also known as a control program as it controls and and performance.6 e.g.: , Chorus. In the 1990s the manages the virtual machines running on top of it. The second generation microkernel L4 is created by Jochen VMM is classified into two types2 depending upon the Liedtke mainly to overcome the performance limitations layer at which VMM comes as shown in Figure 1. When of first generation microkernels. Developed at the con- the hypervisor runs directly on top of hardware it is called cept of a minimal kernel, accordingly a concept is toler- Type 1 VMM or a bare metal hypervisor. In Type 1 VMM, ated inside the kernel only if it cannot be moved out of the hypervisor runs at highest privilegeIP: 192.168.39.151 mode. Example On: of Thu,the 30 kernel. Sep 72021The 04:06:00 main reimplementation work of L4 hap- Type 1 is Xen hypervisor.Copyright: In Type 2 American VMM, Scientificpened at threePublishers universities and are shown in the Figure 3 the hypervisor runs on top of the host operating systemDelivered and byalong Ingenta with the main implementation names.8 At the Uni- hence it is called hosted hypervisor. Examples of Type 2 versity of Karlsruhe, they developed a highly portable hypervisors are VMware Workstation, VMware Player and microkernel named L4ka:Pistachio. At the University of VirtualBox. Type 2 hypervisor can make use of the ser- South Wales, they named the original version of L4 as vices provided by the host operating system but has higher L4/ and developed L4/MIPS and L4/Alpha, these ver- overhead as it can access the hardware only through the sions were unportable and later they were interested in host OS. Different virtualization approaches are available the portability of L4ka:Pistachio and which led to the for an embedded system4 from among them we chose new version NICTA::L4-embedded. NICTA ported it to a microkernel based virtualization due to its low memory number of architectures including ARM and optimized it footprint and small base. for use in resource constrained embedded systems. Qual- comm’s association with NICTA created the commer- 2.2. Microkernel as Hypervisor cial OKL4. The third generation microkernel at NICTA A microkernel provides only the basic functionality is SeL4. At the Dresden University of Technology, they required for the functioning of the operating system such as Address space management, Thread management and

Fig. 1. (a) Type 1 hypervisor (b) type 2 hypervisor. Fig. 2. Microkernel based system.

274 J. Low Power Electron. 15, 273–281, 2019 Mathew et al. Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems

Fig. 3. Major reimplementations of L4. developed ++ implementation of L4 kernel interface, the object. The services running on Fiasco and L4Re are called L4/Fiasco. Fiasco is a preemptible real time ker- called servers and there are many standard servers namely nel and is used in Dresden Real Time Operating Sys- sigma0, Ned, moe. Sigma0 is the initial resource manager tems Project (DROPS).9 The third generation microkernel and is responsible for handling the page faults of the root of fiasco is called Fiasco.OC.10 We selected Fiasco.OC task. Moe is the first task or the root task of L4Re started for our analysis of virtualization on embedded boards and by the Microkernel. It is responsible for bootstrapping the more details of it are given in the following sections. system and provides the basic resource management for the applications running on top of L4Re. Moe initiates the 2.2.2. Fiasco.OC Microkernel system by starting the init , which gets access to all Fiasco.OC10 is a third generation microkernel developed the resources managed by Moe and to the interface pro- at Dresden University. Fiasco.OC is an Object Capabil- vided by Sigma0. The default init process is called Ned. ity System, designed in such a way that different objects Ned configures the system using Lua scripts. Ned starts provide different services and Capabilities provides access the applications and the initial services on L4Re and to rights to these objects. ApplicationsIP: 192.168.39.151 on Fiasco.OC make On: Thu,provide 30 Sep communication 2021 04:06:00 channels between them. use of objects which provide servicesCopyright: and these servicesAmerican Scientific Publishers Delivered by Ingenta communicate with each other. The main advantage of 2.3. Related Work on Performance Analysis of using a Capability-based system is to increase the security Virtualization Platforms of the system as a user process cannot access these objects Some of the previous works on analyzing the performance directly. Some object types are task, thread, IPC gate, IRQ, of different virtualization techniques are briefed in this factory, etc. Each task gets an initial set of capabilities for section. Hermann et al.6 compared the performance of some of these objects at startup. Capabilities are references second generation microkernel L4 to that of first gen- to objects, any object interaction requires a capability, i.e., eration microkernel. They compared the performance of it provides the access right to objects. Fiasco.OC runs at L4Linux with native Linux and MkLinux. Kim et al.12 the highest privilege level and it has only the basic func- developed a profiling tool which collects system wide tionalities which cannot be otherwise moved to the user information from L4 Microkernel using hardware per- level. It is not a complete operating system and has no formance counters. They compared the performance of device drivers, filesystem, or network. All other services other than basic services need to be added as user level services. The Figure 5 shows the block representation on Fiasco.OC microkernel based system.

2.2.3. L4 Runtime Environment (L4Re) A software layer, L4 runtime environment (L4Re)11 which comes on top of Fiasco, provides a runtime environment to run applications on top of Fiasco. The block representation of L4Re is shown in the Figure 4. The interface provided by Fiasco microkernel is not sufficient to build an applica- tion hence l4re provides interface functions to create and control threads. L4Re has libraries of libc, libpthread etc, which allows users to build an application. Every l4Re application gets an initial set of capabilities to interact with Fig. 4. L4 runtime environment.

J. Low Power Electron. 15, 273–281, 2019 275 Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems Mathew et al.

L4linux applications with Linux applications. Felix et al.13 The block diagram of a paravirtualized system on l4 compared the interrupt latencies and the thread switching microkernel is shown in Figure 5. As shown in the figure, times of L4/Fiasco/L4linux based application subsystem Fiasco runs directly on top of the hardware at the high- with FreeRTOS. Also showed that the runtime overhead est privilege level and all other services run as userlevel of microkernel is relatively small for real time applica- tasks.20 The interface provided by Fiasco microkernel is tion. Jose et al.14 provided energy related performance not sufficient to build an application hence L4Re provides analysis of virtual machines on x86. Papaux et al.15 mea- runtime environment to execute the L4linux as an appli- sured the performance of virtual machines on ARM with cation. Hence paravirtualization making an OS a native KVM as hypervisor on TI board. Dall et al.16 measured L4Re application. L4linux21 is a paravirtualized Linux OS the performance of arm virtualization on boards. modified to run on L4 runtime environment, which runs They did a performance study of virtual machines with two on top of Fiasco. L4linux allows execution of unmodified ARM hypervisors Xen and KVM. Lucian et al.17 analyzed Linux application to be run on top of it. L4linux runs as a the performance of Linux running paravirtualized on L4 user level server application on top of Fiasco. microkernel. Yang Xu et al.18 evaluated the performance of L4 microkernel based virtualization on a mobile platform 3.2. Hardware Assisted Virtualization using LMbench. In , the guest operating system is unmod- ified and is unaware that it is running on a hypervisor. Full virtualization is attained by two ways, Full virtualization 3. PERFORMANCE ANALYSIS OF by Dynamic Binary Translation and Hardware-assisted vir- VIRTUALIZATION TECHNIQUES tualization. In binary translation, the hypervisor emulates Depending on the techniques by which virtualization the underlying hardware. It emulates the instruction of one is provided, it is classified into full virtualization and over other by translation of code. This method is used in paravirtualization. QEMU and VMware to provide full virtualization when the instruction set is not virtualizable. 3.1. Paravirtualization In Hardware-assisted virtualization, the microprocessor Paravirtualization provides the environment to run the architecture has special instructions to aid the virtualiza- entire OS in a virtual machine. Initially, operating systems tion of hardware. This is possible only in architectures that are designed to run at highest privilegedIP: 192.168.39.151 mode, but inOn: a Thu,support 30 Sep it e.g.,2021 04:06:00 VT and AMD-. Most of the ARM virtual machine, OS runs at a lower privilegedCopyright: level. American For an Scientificv7-Cortex Publishers A7 and Cortex A15 and ARM v8-Cortex A53 OS to be virtualizable all the sensitive instructionDelivered should bysupport Ingenta Hardware-assisted virtualization without the need be the subset of privileged instruction,1 according to this for binary translation or Para-virtualization. Examples are ARM and x86 are not originally virtualizable. In paravir- Xen, VMware, KVM. tualization, the guest operating system code is modified Fiasco.OC microkernel based virtualization supports to replace the nonprivileged sensitive instructions with a hardware-assisted virtualization which allows unmodified direct call to the hypervisor called hypercalls. The nonpriv- guest OS in a virtual machine with L4Re uvmm module. ileged instructions run directly on the CPU.19 Paravirtual- uvmm is the virtual machine monitor (VMM) for the L4Re ization eliminates the overhead of trap and emulate method operating system. It allows to configure and execute guest to access the privileged resource. The main disadvantage OSes on top of L4Re. It provides virtual interfaces to the of this is that the kernel of the guest OS needs to modified guest OS, to communicate with L4 components. The block with each kernel version release. This kind of virtualiza- representation of full virtualization on a microkernel based tion is provided by Xen, Hyper-V and L4linux running on system is shown in Figure 6. top of the L4 microkernel.

Fig. 5. Fiasco.OC microkernel based system. Fig. 6. Full virtualization on microkernel based system.

276 J. Low Power Electron. 15, 273–281, 2019 Mathew et al. Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems

Table I. Coremark score of native linux and linux on QEMU.

Linux on Linux on Linux on No of CM native QEMU with QEMU without processes machine kvm kvm

1 2123014 2094554 329504 2 2089303 20300 166303 3 1985703 2020085 123144 4 1911002 1860515 76181 5 174190 1777913 70157 6 1601684 1613795 46282 7 151371 1504535 4138 8 1467997 1432899 32716 Fig. 8. Architecture of Xen Hypervisor.

3.3. QEMU 3.4. Xen Hypervisor QEMU works as a Type 2 Hypervisor. QEMU allow Xen Hypervisor is a Type 1 or bare-metal hypervisor.23 unmodified guest OS to run on top of it. It emulates Xen hypervisor boots from a bootloader like GRUB and the CPU by dynamic binary translation. In machines virtual machines in Xen, called as domains run on top which support hardware-assisted virtualization KVM with of Xen as shown in Figure 8. Xen has a control domain QEMU can run a guest OS without binary translation, called dom0 which controls the guest OS and hypervisor. which gives better performance than dynamic binary translation.22 DomUs, are the Guest OS which runs at userlevel with less privilege i.e., they cannot directly access the hardware Experimental setup: The machine used for testing has and the I/O, or cannot start new domains. Xen runs the Intel Core i7 with 15.6 GB memory and 16.04 OS. modified guest OS with paravirtualization technique and QEMU is installed on it with Ubuntu as a guest OS. Guest OS performance is evaluated with coremark. core- unmodified guest OS on machines with hardware-assisted x mark score is given in Table I. Qemu is executed with virtualization like VT- or AMD V, known as Xen-HVM. Experimental setup: Xen Hypervisor runs on a machine options SMP 8 and enable-kvm option. x Table I gives the Coremark scoreIP: 192.168.39.151 of Ubuntu on native On: Thu,with 30 IntelSep core2021i3 04:06:00 CPU [email protected] GHz 4 and memory environment and virtualized environmentCopyright: with different American Scientific1.8 GB, Ubuntu Publishers OS. Two Domus with Ubuntu guest OS Delivered by Ingenta options of QEMU. each with 4 virtual CPUs run on Xen Hypervisor. The first column in the table shows the number of Core- Table II gives the Coremark results on native Linux and mark process running in parallel, the second column gives in Xen hypervisor. In the table, the first column gives the the Coremark score on host OS, the third column gives number of Coremark process running parallel, the sec- the Coremark score on the guest OS with kvm enabled, the ond column gives the result of Coremark on the native fourth column gives the Coremark score of Linux run- machine, the third column gives the result of Coremark ning on QEMU without kvm. The performance results in on Dom0 and the fourth column gives the results on the Figure 7 show that enabling kvm gives almost near native guest(Ubuntu) i.e., DomU. The columns 5 and 6 gives performance. the result of running Coremark simultaneously on two guests on Xen hypervisor, Figure 9 shows the results which indicates that the overhead caused by Xen hypervisor is negligible.

Table II. Performance comparison of Xen Hypervisor and native linux.

Simultaneously running two Ubuntu guest on Xen Hypervisor

#CM Native linux Dom0 DomU DomU1 DomU2

1 1108893 1106369 1104832 1102188 1098452 2 1107594 1104623 1104867 781145 780866 3 82944 846207 843414 522786 520901 4 766283 767308 769569 392631 392213 5 612423 612037 645362 341423 312502 6 510687 509417 513655 311659 260083 7 438547 439775 440312 279552 172977 Fig. 7. Performance comparison of native linux and linux in virtualized 8 38133 385175 385001 155612 146070 environments.

J. Low Power Electron. 15, 273–281, 2019 277 Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems Mathew et al.

Fig. 10. Performance comparison of native linux and L4linux on Fig. 9. Coremark results on Xen Hypervisor. zedboard.

4. PERFORMANCE ANALYSIS OF 4.1. CPU Performance Measurements MICROKERNEL BASED VIRTUALIZATION The CPU performance is measured using coremark bench- ON EMBEDDED SYSTEMS marking tool. Coremark score indicates the number of We have analyzed the performance of two different types iterations per second. Coremark results on Zedboard is of microkernel based virtualization i.e., paravirtualization given in Table III. In the table first column represents and hardware-assisted virtualization techniques on a low the number of Coremark instance running simultaneously. power embedded system board. The performance of par- The second column represents the coremark score on avirtualization and hardware-assisted virtualization is eval- L4linux and the third column gives the coremark score on uated on LS1012a Freedom board. ARM has announced Linux-yocto build, without any virtualization. The results its support for hardware-assisted virtualization from 2010 show that for CPU intense applications there is no much and is available in ARMv7 Cortex A15 and Cortex A7 difference in results obtained in the virtual environment based boards and ARMv8 Cortex A53 based boards. Here and non-virtual environment as shown in Figure 10. Zed- we have used two different ARMIP: boards 192.168.39.151 for evaluation. On: Thu,board 30 Sep has 22021 cores 04:06:00 and hence the coremark score remains The first one is Freedom board based onCopyright: the ARM v8.American This Scientificthe same Publishers for the first two processes. A similar result is freedom board has QorIQ layerscape 1012A lowDelivered power byobtained Ingenta for LS1012a board which has a single core pro- communication processor-ARM cortex A53, 512MB QSPI cessor. The coremark score obtained for a single process flash and configured for 512MB RAM. This board sup- on LS1012a board is approximately 518 and for two pro- ports hardware-assisted virtualization and thus it allows to cesses is approximately 258. The score did not show much run unmodified Linux on top of it. The second board used variation when executed in the virtual environment created for evaluation is Zedboard. It has dual core ARM Cortex by both paravirtual and hardware-assisted techniques. A9 based processor. This board does not support hardware- assisted virtualization and hence it is not possible to run 4.2. Bandwidth Measurements unmodified Linux on top of this. The bandwidth measurements are intended to show how We used LMbench, a benchmarking tool to analyze the fast the system can move data. The bandwidth tests performance of kernel24 with different virtualization tech- used here are bw_file_rd which benchmarked the per- niques. The test suite consists of many microbenchmarks formance for file read, the parameter open2close/ioonly for measuring the latency and bandwidth of the system. indicates whether the measurement includes profiling of Among them, we selected a few microbenchmarks for the open and close or just input-output only. bw_mem_rd and latency and bandwidth analysis. bw_mem_wt allocates the specified amount of memory, zeros it, and measures the reading and writing time to that

Table III. Coremark results on zedboard. Table IV. Bandwidth measurements on freedom board. #CM Paravirtualized-L4linux Native linux Native Full 1 441.20 442.11 linux Paravirtualized virtualized 2 441.17 441.20 (MB/s) (MB/s) (MB/s) 3 291.86 289.87 4 220.23 220.71 bw_file_rd 64 k open2close 80094 65191 73781 5 176.22 177.27 bw_file_rd 64 k io_only 87713 72036 83668 6 146.71 146.92 bw_mem rd 174859 174184 171644 7 130.71 128.51 bw_mem wr 66269 65854 62364 8 110.27 110.27 bw_mem cp 46533 45050 44027

278 J. Low Power Electron. 15, 273–281, 2019 Mathew et al. Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems

Table V. Bandwidth measurements on zedboard. Table VII. Latency measurements on zedboard.

Paravirtualized-L4linux Native ParaVirtualization Native (MB/s) linux (MB/s) (s) linux (s) bw_file_rd 64 k open2close 220.02 310.26 lat_ctx size = 0 k1906 486 bw_file_rd 64 k io_only 246.21 337.41 lat_ctx size = 16 k3136 1435 bw_mem rd 492.40 735.16 lat_syscall open 2769 1066 bw_mem wr 426.47 430.23 lat_syscall write 568 07123 bw_mem cp 199.82 239.32 lat_syscall read 676 106 lat_syscall null 491 0474 memory. The benchmark bw_mmap_rd creates a memory in microseconds, lesser the value lesser the latency and mapping to the file and then reads the mapping. better the performance. In the table Native Linux repre- The results of LMbench bandwidth measurements on sents the performance results obtained without any vir- the ls1012-frdm board are shown in the Table IV and that tualization techniques applied. The Linux OS used here of the Zedboard are shown in Table V. The results are in is build with Yocto build system. Paravirtualized L4linux units of megabytes moved per second, larger the value bet- represents the results obtained from L4linux. Full virtu- ter the performance. In the table Native Linux represents alized represents the results obtained by running unmodi- the performance results obtained without any virtualization fied Linux in a hardware-assisted virtual environment. The techniques applied, that means Linux on bare metal. The results of latency measurements on ls1012a-frdm board Linux OS used here is build from yocto build system. Par- in Table IV show that non-virtualized system has better avirtualized L4linux represents the results obtained from performance than full virtualized and Paravirtualized sys- L4linux. Full virtualized represents the results obtained by tems. This shows that L4linux has got more latency than running unmodified Linux in a hardware-assisted virtual other systems. The results on Zedboard in Table V show environment. The results of bandwidth measurements on that latency measurements of L4linux are not as good as ls1012a-frdm board in Table IV show that system without that of Native Linux. L4linux incurs more delay for sys- virtualization has better performance than full virtualized tem calls because of the overhead of running in a virtual and paravirtualized, even then nonvirtualized system and environment. full virtualization system has almostIP: 192.168.39.151 equal or comparable On: Thu, 30 Sep 2021 04:06:00 results. The file read operation is slowCopyright: in paravirtualized American Scientific Publishers Linux. The results on Zedboard in Table V showsDelivered that by5. Ingenta CONCLUSION bandwidth measurements of L4linux are lower than that of We have reviewed the major virtualization techniques used Native Linux. in computing environments such as desktops, embedded systems, etc. QEMU, Xen and L4 have been used as target 4.3. Latency Measurements virtualization platforms for experimentation. Performance The latency measurements indicate how fast a system can analysis on processing capabilities of these virtualization do control operations. We measured microbenchmarks for techniques has been done to find overheads of virtualiza- context switching and system call overhead for our analy- tion and the relative benefits of different schemes. It has sis. The results are shown in the Tables VI and VII. The been found that significant performance improvement is microbenchmark lat_ctx measures context switching time achieved by enabling the Kernel-based Virtual Machine for the specified number of processes for the specified size. feature in QEMU. We have explored microkernel based lat_syscall measures the system call latency of open, close, virtualization techniques on ARM based embedded boards read and null. for performance analysis. We have made use of this imple- The results of LMbench latency measurements on the mentation to compare the performance of Microkernel ls1012-frdm board are shown in the Table VI and that based virtual environment to a nonvirtual environment. of the Zedboard is shown in Table VII. The results are The results from LMbench show that the bandwidth mea- surements and latency measurements of L4linux are not as good as that of Yocto build Linux without virtualiza- Table VI. Latency measurements on ls1012a-frdm board. tion, mainly due to the overhead of running L4linux in a Native Paravirtualization Full virtual environment. Also, Hardware-assisted virtualization linux (s) (s) virtualization (s) on microkernel is giving better performance than paravir- lat_ctx size = 0 k300 878 386 tualized L4linux and slightly lesser performance than that lat_ctx size = 16 k1413 1960 1440 of Yocto build Linux without virtualization. lat_syscall open 547 1155 791 lat_syscall write 0654 3085 08611 Acknowledgment: This research is supported by the lat_syscall read 0812 365 1229 Early Career Research Award given to Dr. Bijoy lat_syscall null 0 2382 2 769 0 542 A. Jose by Science and Engineering Research Board,

J. Low Power Electron. 15, 273–281, 2019 279 Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems Mathew et al.

Department of Science and Technology, Government 12. D. Kim, J. Eom, and C. Park, L4oprof: A performance-monitoring- of India (ECR/2016/000448). The authors would like unit-based software-profiling framework for the l4 microkernel. ACM to thank Adam Lackorzynski from the Technical Uni- SIGOPS Operating Systems Review 41, 69 (2007). 13. F. Bruns, S. Traboulsi, D. Szczesny, E. Gonzalez, Y. Xu, and versity of Dresden, for his support and valuable A. Bilgic, An evaluation of microkernel-based virtualization for suggestions. embedded real-time systems. 2010 22nd Euromicro Conference on Real-Time Systems (ECRTS), IEEE (2010), pp. 57–65. 14. B. A. Jose and A. Agrawal, Improving energy efficiency of virtual References machines with timer tick variations. Journal of Low Power Electron- 1. G. J. Popek and R. P. Goldberg, Formal requirements for virtual- ics 11, 401 (2015). izable third generation architectures. Communications of the ACM 15. G. Papaux, D. Gachet, and W. Luithardt, Processor virtualization 17, 412 (1974). on embedded linux systems. 2014 6th European Embedded Design in Education and Research Conference (EDERC) 2014 2. J. E. Smith and R. Nair, The architecture of virtual machines. Com- , IEEE ( ), pp. 65–69. puter 38, 32 (2005). 16. C. Dall, S.-W. Li, J. T. Lim, J. Nieh, and G. Koloventzos, Arm 3. E. Bugnion, J. Nieh, and D. Tsafrir, Hardware and software support virtualization: Performance and architectural implications. ACM for virtualization. Synthesis Lectures on Computer Architecture 12, 1 SIGARCH Computer Architecture News 44, 304 (2016). (2017). 17. L. Mogosanu, M. Carabas, C. Condurache, L. Gheorghe and 4. D. Mathew and B. A. Jose, Performance analysis of virtualized N. Tapus, Evaluating architecture-dependent linux performance, embedded computing systems, 2017 7th International Symposium 2015 20th International Conference on Control Systems and Com- on Embedded Computing and System Design (ISED), IEEE (2017), puter Science (CSCS), IEEE (2015), pp. 499–505. pp. 1–5. 18. Y. Kinebuchi, H. Koshimae, and T. Nakajima, Constructing machine 5. G. Heiser, The role of virtualization in embedded systems, Proceed- on portable microkernel, Proceedings of the 2007 ACM ings of the 1st Workshop on Isolation and Integration in Embedded symposium on Applied Computing,ACM(2007), pp. 1197–1198. Systems,ACM(2008), pp. 11–16. 19. A. Lackorzynski, A. Warg, and M. Peter, Virtual processors as kernel 6. H. Härtig, M. Hohmuth, J. Liedtke, J. Wolter, and S. Schönberg, The interface. Twelfth Real-Time Linux Workshop (2010), Vol. 2010. performance of -kernel-based systems, ACM SIGOPS Operating 20. H. Schild, A. Lackorzynski, and A. Warg, Faithful virtualization on Systems Review,ACM(1997), Vol. 31, pp. 66–77. a real-time operating system. RTLWS11 (2009). 7. J. Liedtke, On Micro-Kernel Construction, ACM (1995), Vol. 29. 21. L4linux. https://l4linux.org/. 8. K. Elphinstone and G. Heiser, From l3 to sel4 what have we learnt 22. F. Bellard, Qemu, a fast and portable dynamic translator, USENIX in 20 years of l4 microkernels? Proceedings of the Twenty-Fourth Annual Technical Conference, FREENIX Track (2005), pp. 41–46. ACM Symposium on Operating Systems Principles,ACM(2013), 23. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, pp. 133–150. IP: 192.168.39.151 On: Thu, 30R. Sep Neugebauer, 2021 04:06:00 I. Pratt, and A. Warfield, Xen and the art of vir- 9. H. Härtig, M. Roitzsch, A. Lackorzynski, B.Copyright: Döbel, and A. American Böttcher, Scientifictualization. PublishersACM SIGOPS Operating Systems Review ACM (2003), L4-virtualization and beyond. Korean Information ScienceDelivered Society by Ingentavol. 37, pp. 164–177. Review 2(2008). 24. L. W. McVoy, C. Staelin et al., Lmbench: Portable tools for perfor- 10. L4Fiasco microkernel. https://os.inf.tu-dresden.de/fiasco/. mance analysis. USENIX Annual Technical Conference, San Diego, 11. L4Re-L4 runtime environment. https://l4re.org/doc. CA, USA (1996), pp. 279–294.

Deepa Mathew Deepa Mathew received her B.Tech. degree in Electronics and Communication from Mahatma Gandhi University in 2004 and her M.Tech. degree in Communication Engineering from Mahatma Gandhi University in 2015. Prior to her M.Tech., she has 7+ years of Industrial experience in Embedded Systems Software development. Currently, she is Project Assistant and is pursuing Ph.D. at Department of Electronics, Cochin University of Science and Technology. Her current research interests include virtualization for embedded platforms, Internet of Things and its security. Bijoy A. Jose Bijoy A. Jose is an Assistant Professor at the Department of Electronics in Cochin University of Science and Technology. He has received his B.Tech. from School of Engineering CUSAT and M.S. from State University of New York. He received Ph.D. in Computer Engineering from Virginia Tech and worked in Intel Corporation in California office for 4 years. He is the industry consultant for several firms including KITCO, Pumex, etc. He has received Early Career Research Award from Department of Science and Technology, Gov. of India in 2016. He is the principal investigator to multiple projects from DST, IEEE HAC, etc. His areas of interest include cyber security, internet of things and cyber physical systems.

280 J. Low Power Electron. 15, 273–281, 2019 Mathew et al. Performance Analysis of Microkernel Based Virtualization Techniques on Embedded Systems

Priyadarsan Patra Priyadarsan Patra is Intel Principal Scientist and serves as the chief validation architect of the Data Center Group’s System Validation organization. He holds a track record of 20+ years at Intel in creating and leading world-class research and development of Intel’s flagship Server, SoC and Client processors and advanced devices (e.g., Wesmere, Haswell, Baytrail, Broxton, Lewisburg, etc.) involving high- and low-power designs, and their Validation, Debug, and Test for several generations of flagship silicon devices/systems. He has published 50+ technical papers and books, and authored over a dozen patents and inventions. He has held leadership roles in various scientific and technological capacities. Dr. Patra founded and chairs the world-wide IEEE-CEDA committee called the System Validation and Debug Technology Committee. He has led several established, international technical conferences on System Design, CAD, Low-power circuits and Validation areas in the role of organizing or technical program chair. Nominated Senior Member of both the ACM and the IEEE, he has extensive experience in non-profit board leadership, in academic mentoring as well as professional service. He received his Ph.D., the University of Texas at Austin, Texas, USA and M.S. Computer and Information Science, the University of Massachusetts at Amherst, USA.

IP: 192.168.39.151 On: Thu, 30 Sep 2021 04:06:00 Copyright: American Scientific Publishers Delivered by Ingenta

J. Low Power Electron. 15, 273–281, 2019 281