Quick viewing(Text Mode)

HEP Specific Benchmarks of Virtual Machines on Multi-Core CPU

HEP Specific Benchmarks of Virtual Machines on Multi-Core CPU

HEP Speci c Benchmarks of Virtual Machines on multi-core CPU Architectures

M. Alef

Forschungszentrum Karlsruhe, Karlsruhe, Germany CPU Model Mem n (GB) Cores Mainboard Year I. Gable AMD Opteron 246 (2.0 GHz SC) 4 2 MSI-9145 2003 Abstract Department of Physics and Astronomy, University of Victoria, Victoria, Canada 270 (2.0 GHz DC) 8 4 MSI-9145 2005 2376 (2.3 GHz, QC, Shanghai) 16 8 Supermicro H8DMU+ 2009 Virtualization technologies such as Xen can be used in order to sat- Intel Xeon 3.00 GHz, SC (Nocona) 2 2 HP ProLiant DL360 G4, HT off 2004 isfy the disparate and often incompatible system requirements of 5160 (3.0 GHz, DC, Woodcrest) 6 4 Intel Server Board S5000VCL 2006 VM - Physical Comparison E5345 (2.33 GHz, QC, Clovertown) 16 8 Supermicro CSE-812L-520CB 2007 dierent user groups in shared-use computing facilities. This capa- i386 VM − Physical Comparison E5405 (2.0 GHz, QC, Harpertown) 16 8 Dell PowerEdge 2950 2007 bility is particularly important for HEP applications, which often 70 L5420 (2.5 GHz, QC, Harpertown) 16 8 Supermicro X7DCT 2008 have restrictive requirements. The use of virtualization adds ex- TableTable 1. TheThe benchmarkingbenchmarking testbed testbed is assembledmachines. from Note machines the SC, givingDC, and a sample QC correspond spanning theto 60 lastSingle, 3-5 Dual, years and of of Quad common Core. CPU types for HEP worker nodes. Above ’year’ refers to year of ibility, however, it is essential that the virtualization technology CPUavailability Model of that Generation of CPU.Mem n place little overhead on the HEP application. We present an evalua- CPU n VM (GB) Cores1 Mainboard VM 1 VM % diff % diYearff 50 AMD OpteronType Cores Type Physical per core per box per core per box VM Hypervisor Kernel Disk tion of the practicality of running HEP applications in multiple Vir- 1 VM per core 246 (2.0Opteron GHz SC) 246 2 i386 12.564 2 12.32 MSI-9145 12.29 -1.95 -2.152003 Type Version Version Access 270 (2.0Opteron GHz DC) 270 4 i386 24.688 4 24.98 MSI-9145 23.93 1.22 -3.042005 1 VM per box i386 2.6.18-92.1.13.el5xen i686 tual Machines (VMs) on a single multi-core system. We use 40 2376 (2.3Opteron GHz, QC, 2376 Shanghai) 8Xen i386 3.0.3 16 62.94 8 64.92 Supermicro 60.45 H8DMU+tap:aio 3.16 -3.952009 Physical 64 2.6.18-92.1.13.el5xen x86 64 the benchmark suite used by the HEPiX CPU Benchmarking Work- x86 64 61.05 64.24 62.46 5.23 2.31 Intel XeonXeon Nocona 2 i386 10.71 10.41 10.62 -2.77 -0.85 30 Intel Xeon 3.00 GHz, SC (Nocona) 2 2 HP ProLiant DL360 G4, HT off 2004 ing Group to give a quantitative evaluation relevant to the HEP HEP−SPEC06 Xeon 5160 4 i386Hardware 37.52 38.00 VM Memory 38.07 1.27 1.47 5160 (3.0Xeon GHz, E5345 DC, Woodcrest) 8 i386Type 58.886 4 58.65Allocation Intel Server 58.63 Board S5000VCL -0.39 -0.43 2006 community. Benchmarks are packaged inside VMs and then the E5345 (2.33Xeon GHz, E5405 QC, Clovertown) 8 i386 16 57.22 8 57.08 Supermicro 57.30 CSE-812L-520CB -0.24 0.14 2007 E5405 (2.0 GHz, QC, Harpertown)All AMD 16 and 8 Dell PowerEdge 2950 2007 20 x86 64 57.04 56.68n 1900 MB 56.58 -0.63 -0.81 Quad Core × VMs are booted onto a single multi-core system. Benchmarks are L5420 (2.5Xeon GHz, L5420 QC, Harpertown) 8 i386 16 67.36 8 66.99 Supermicro 67.34 X7DCT -0.54 -0.032008 Intel Nocona n 870 MB x86 64 67.31 68.22× 67.23 1.35 -0.12 then simultaneously executed on each VM to simulate highly Intel Woodcrest n 870 MB Table 1. The benchmarking testbed is assembled from machines× giving a sample spanning the 10 Table 2. The benchmark scores for all machines in the testbed. The scores are in loaded VMs running HEP applications. These techniques are ap- lastTable 3-5 years 3. ofThe of common benchmark CPU results types on for all HEP CPU worker arctectures. nodes. Above Note that ’year’ n refers Cores to indicates year of the availabilityHEP-SPEC06. of that Generation Note that of the CPU. number of cores is per machine and not per CPU. plied to a variety of multi-core CPU architectures and VM con gu- Tablenumber 2. ofVM cores configurations per machine used and for not benchmarking. per CPU. Each VM is given the memory listed above 0 when running on a particular hardware type where n is the number of VCPUs. rations. VM Hypervisor Kernel Disk Xeon 5160 5. Conclusion Opteron 246Opteron 270Opteron 2376Xeon Nocona Xeon E5345 Xeon E5405 Xeon L5420 Type Version Version Access ofOur CPU results with have sufficient shown that memory it is indeed to run very HEP-SPEC06. feasible to run This highly is not CPU a significant loaded virtual because machines this i386 2.6.18-92.1.13.el5xen i686 hardwareon current is generation nearing itsXen multi-core end 3.0.3 of life CPUs. and does The not full have cross sufficient section memorytap:aio of CPU to measured run current showed LHC x86 64 2.6.18-92.1.13.el5xen x86 64 software.performance The boosts full selection in some of cases hardware (outline is the listed cases) in Table and some 1. performance penalties in (outline i386 VM % Dierence from Physical the cases). This results open the possibilities of running multiple hep jobs each wrapped in a i386 VM % Difference from Physical 4 3.3.machine The on Virtual a single Machines multi-coreHardware box will little penalty. VM Memory 1 VM per core The benchmarked virtual machinesType were createdAllocation in two varieties: i386 and x86 64 xen VMs both 1 VM per box with the 2.6.18-92.1.6.el5xen kernel. For easy portability between benchmarking physical hosts Benchmark Technique 6. Acknowledgements All AMD and 3 VM were created as simple disk images storedn as1900 regular MB files on the domain0 machine. The The financial support ofQuad the Core National Intels Sciences and Engineering Research Council of VMs accessed their disk using the blktap driver with× asynchronous I/O achieved with the Xen Canada (NSERC) and theIntel German Nocona Federal Ministryn 870 MBof Education and Research (BMBF) is configurationacknowledged. file option ’tap:aio’ [9]. For a full list× of VM specification please refer to Table 2 Intel Woodcrest n 870 MB All measurements employed the HEP-SPEC06 bechmark because it 2 × has been shown to be linearly related to actual HEP application References Table 3. Virtual Machine con guration; n is the number Table[1] 2.AgarwalVM configurations A, Charbonneauof physical used A, cores. Desmarais for benchmarking. R, Enge R, Gable Each I, VM Grundy is given D, Penfold-Brown the memory D, listed Seuster above R, Sobie 1 performance. HEP-SPEC06 is derived from SPEC CPU2006 from the when runningR, Vanderster on a particular D C 2008 hardware Deploying HEP type Applications where n is Using the number Xen and of Globus VCPUs. Virtual Workspaces Proc. of Computing in High Energy and Nuclear Physics 2007 J. Phys.: Conf. Ser. 119 062002 doi:10.1088/1742- Standard Performance Evaluation Corporation. HEP-SPEC06 came 6596/119/6/062002 0 x86_64[2] MichelottoVM Comparison M, Alef M, Bly M J, Benelli G, Brasolin F, Degaudenzu H, De Salvo A, Gable I, Hirstius A, about from the 2007-2008 eorts of the HEPiX CPU Benchmarking of CPU withHristov sufficient P, Iribarren memory A, Meinhard to run H, HEP-SPEC06. Wegner P 2009 A This comparison is not of a HEP significant code with because SPEC benchmark this hardwareon is nearingmulticore its worker end nodes, of life to andappear does in Proc. not of have Computing sufficient in High memory Energy to Physics run current 2009 LHC −1 [3]x86_64Standard VM Performance − Physical Evaluation Comparison Corporation website www.spec.orgx86_64 VM % Difference from Physical working group to identify a suitable replacement for the now re- software.[4] HEPiX The full organization selection website: of hardwarewww.hepix.org is listed in Table 1.

% Difference from Physical 70 6 [5] HEPiX presentation: http://indico.twgrid.org/getFile.py/access?contribId=140&sessionId=1 VM per core tired (February 2007) SPEC int 2000 benchmark that has been 91&resId=2&materialId=slides&confId=471 1 VM per box −2 3.3.60 The Virtual Machines 5 The benchmarked[6] Spradling, C.D., virtual 2007 machines SPEC CPU2006 were created benchmark in twotools varieties: Architecture i386 and x86 News64 March xen VMs 2007 both 130-4 Vol popular in the HEP community. We selected Scienti c Linux (SL) as 50 35 doi:10.1145/1241601.1241625 4 with[7] theHenning 2.6.18-92.1.6.el5xen J L 2000 SPEC kernel.CPU2000: For measuring1 VM easy per core portability CPU performance between in the benchmarking New Millennium physicalComputer hosts, 33, 7, VM were created as simple disk images stored as regular files on the domain0 machine. The the for all benchmarks because of its wide use in −3 40 pp 28-35 1 VM per box 3 VMs[8] accessedThe Scientific their diskLinux using Distribution: the blktapwww.scientificlinux.orgPhysical driver with asynchronous I/O achieved with the Xen [9] Matthews J N 2008 Running Xen: A Hands-On Guide to the Art of Virtualization (Boston: Prentice Hall) the HEP community. We selected the Xen Virtual Machine Monitor configuration30 file option ’tap:aio’ [9]. For a full list of VM2 specification please refer to Table 2 [10] Youseff, L.; Wolski, R.; Gorda, B.; Krintz, C., ”Evaluating the Performance Impact of Xen on MPI and −4 HEP−SPEC06 Process Execution For HPC Systems,” Virtualization Technology% Difference from Physical in Distributed Computing, 2006. VTDC because of its ease of use with SL 5.2. The complete list of machines 20 1 2006. First International Workshop on , vol., no., pp.1-1, 17-17 Nov. 2006 doi=10.1109/VTDC.2006.4 Xeon 5160 Xeon E5345 Xeon E5405 Xeon L5420 benchmarked is available in Table 1. Opteron 246Opteron 270Opteron 2376Xeon Nocona 10 0

0 −1 Opteron 2376 Xeon E5405 Xeon L5420 Opteron 2376 Xeon E5405 Xeon L5420 We are most interested in investigating the practicality of running n VMs where n is the number of cores per machine. In order to Benchmarking Con guration evaluate this eciency of this method we compare benchmarks in 1 VM per core 1 VM per box Physical Machine three con gurations: (1) 1 VM per core, (2) 1 VM using all the physi- cal cores (i.e 1 VM per box), (3) the physical machine. Please refer to VM VM VM VM Figure ‘Benchmarking Con guration’. 1 2 3 4 Domain VM 1 Domain Domain 0 VM VM VM VM 5 6 7 8 0 8 VCPU 0 8 CPU 1 VM per core - n VMs are booted where n is the number of cores. Xen Hypervisor Xen Hypervisor Xen Hypervisor Each VM is given a memory allocation as listed in Table 3. For Dual CPU Hardware Dual CPU Hardware Dual CPU Hardware example, an 8 core Intel Harpertown box with 16 GB of memory will have 8 VMs booted with 1900 MB of memory each, leaving 1184 MB of memory for the domain0 machine. Each virtual disk contains the SPEC CPU 2006 code. The benchmarks are then Conclusion pre-compiled with HEP-SPEC appropriate ags on each VM. The Our results have shown that, in terms of CPU performance, it is mance dierences between the physical box in all VM con gura- HEP-SPEC06 benchmark is then executed simultaneously on all indeed very feasible to run highly loaded virtual machines on cur- tions. For example the Xeon L5420 (Harpertown) showed perfor- 8 VMs. This causes all VMs to compete for the resources of the rent generation multi-core CPUs. We established this using the mance loss of -0.51% and -0.03% with i386 VMs, in the 1 VM per physical CPUs in much the same way that the individual HEP-SPEC06 benchmark which has been proven to map to real HEP core and 1 VM per box cases respectively. The largest dierence threads of the HEP-SPEC benchmark compete for resources on application performance. No VM benchmark suered more then a among Intel CPUs is the 2003 generation Xeon 3.0 GHz(Nocona) a multi-core box. 4% decrease in performance relative to the physical box. where a -2.77% decrease is seen in the 1 VM per core case. 1 VM per box - In this case a virtual machine is created with VCPUs The 2008 generation Opteron 2376 (Shanghai) showed striking There now appears to be no CPU performance barriers for the wide equal to the number of physical cores on the box. The bench- performance gains of 3.16% (i386 VM) and 5.23% (x86_64 VM) spread adoption of virtualization technologies such as Xen in the mark is then executed in the same way as typical HEP-SPEC06 when running 1 VM per core. However, the 1 VM per box case suf- HEP community. This could enable HEP applications to take advan- run with VCPUs standing in for physical cores. fered the largest VM performance decrease (-3.95%) of any CPU tage of the exibility oered by completely virtualized software benchmarked. The dual core Opteron 270 behaved similarly with stacks for running on computing resources which were not com- Physical Machine - The machine was benchmarked in exactly the smaller scale dierences. patible with HEP software requirements. same way as any HEP-SPEC06 becnhmark run would be done; one benchmark process per physical core. All quad core Intel CPUs benchmarked exhibit negligible perfor- Note: Full references and further details available from the corresponding paper.