Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables Yaozu Dong MochiXue Xiao Zheng Shanghai Jiao Tong University Jiajun Wang Intel Corporation Zhengwei Qi Haibing Guan GPU Usage • Gaming (2D/3D graphic) • HD video hardware decoding • High performance computing High Performance Computing shifts computation-intensive workloads to cloud environment. Molecular Machine Media dynamics learning transcoding simulations A new computing paradigm: GPU Cloud gHyvi An optimized GPU virtualization scheme based on gVirt. GMedia: A media Relaxed Shadow Page Adaptive Hybrid Page transcoding benchmark Table Table Shadowing Policies Up to 13x performance of gVirt and 85% of native. GPU benchmarks 2D/3D graphic GPGPU For OpenGL and DirectX commands For CUDA and OpenCL commands Such as 3DMark, PassMark Such as Rodinia [ATC 2013], Parboil GMedia A hardware media transcoding benchmark based on Intel’s MSDK(Media Software Development Kit). MSDK provides APIs for hardware acceleration. A wrapper invokes media functions form MSDK. Test cases can run with assigned threads and settings. The FPS results reflect the performance. Massive update issue 1000 900 800 700 600 500 Native 452 400 gVirt 300 200 100 33 0 1, 10, 20, 24, 25, 30, NATIVE GVIRT 480p 480p 480p 480p 480p 480p 93% performance degrade. 500 450 250 400 350 200 300 150 250 Native Native 200 gVirt 100 gVirt 150 100 50 50 0 0 1, 720p 5, 720p 10, 15, 20, 1, 4, 5, 6, 7, 10, 720p 720p 720p 1080p 1080p 1080p 1080p 1080p 1080p Shadow page table Page Directory Table Page Table PDE PTE Guest System Host memory PDE PTE Shadow Page Shadow Page Directory Table Table • 512 * 1024 * 4KB = 2GB • Resource partition • User space isolation Strict shadow page table Trap-and-emulation Write protect Page fault Update shadow the page table happens page table Page table gVirt Shadow page table Case profiling Proportion changes 21.43% →79.45% Breakdown of EPT_VIOLATION Take “15thread-720p” as an example: • 62% of all VM_exit is due to EPT_VIOLATION • 83% of EPT_VIOLATION is caused by PTE access 62% 83% EPT_VIOLATION PTE_ACCESS Frequency The frequency of update in 10s. Up to 7.5k times Hot page tables Hot page tables Continuous updates Same table Conclusion of massive update issue VM is frequently swapping graphic memory pages. It modifies the entries of page table massively. Large amount of updates(7.5k in 10s) Updates focus on certain pages (hot pages) Updates are continuous (on the same page) Modificationslead to busy trap-and-emulations. Eventually, overhead happens. (Up to 95%) gHyvi architecture Host VM Guest VM Native GFX Driver gVirt Native GFX Page Table Driver Page Table Strict SPT gHyvi VMM Relaxed SPT GPU Hyper call Pass Through Trap GPU programming model CPU tail head GPU The commands fed by CPU won’t take effect until they are fetched by the GPU. Relaxed shadow page table Remove write protection Page table gHyvi Shadow page table Relaxed shadow page table Page table gHyvi Shadow page table Relaxed shadow page table Page table Reconstruct gHyvi Shadow page table Page table reconstruction Step1: Take a snapshot of guest page table Page table Guest Host Strict shadow page table Snapshot Page table reconstruction Step2: Massive update Page table Guest Host Strict shadow page table Snapshot Page table reconstruction Step3: Compare with snapshot Page table Guest Host Relaxed shadow page table Snapshot Page table reconstruction Step 4: Reconstruct the different part Page table Guest Host Relaxed shadow page table Snapshot Hybrid page table shadowing Dynamic Full relaxed Segmented partial relaxed Dynamic Static partial partial relaxed relaxed 4 Policies for Performance Tuning Static partial relaxed Select 50, 100, 200, 300 hot pages Switch them into relaxed mode Dynamic partial relaxed • All the pages are in strict mode at first • Switch to relaxed mode once it’s touched … Dynamic segmented partial relaxed • All the pages are in strict mode at first • Switch to relaxed mode once it’s touched • Reset the relaxed pages after reconstruction … Evaluation 4 policies of page table shadowing • Full reconstruction • Static partial reconstruction • Dynamic partial reconstruction • Dynamic segmented partial reconstruction Linux 2D/3D performance Windows 2D/3D performance Configuration Hardware CPU 4th Intel Haswell i5 (2.4Ghz) RAM 8GB HDD Seagate 500GB GPU Intel Processor Graphics With 2GB global graphics memory Software Linux VM 64bit Ubuntu 12.04 Windows VM 64bit Windows 7 Xen 4.3 4 VCPUS VM configuration and 2GB system memory Evaluation: full relaxed 900 800 700 The performance of normal 600 500 cases are degraded. 400 300 200 100 0 The performance of issue 1, 480p 10, 480p 20, 480p 24, 480p 25, 480p 30, 480p cases are improved. gVirt Full 450 400 200 350 150 300 250 100 200 50 150 100 0 50 1, 1080p 4, 1080p 5, 1080p 6, 1080p 7, 1080p 10, 0 1080p 1, 720p 5, 720p 10, 720p 15, 720p 20, 720p gVirt Full gVirt Full Evaluation: static partial relaxed The performance of normal cases becomes worse with more relaxed pages. The performance of issue cases are improved. The coverage of hot pages affects the performance. 1000 900 800 700 600 500 400 300 200 100 0 1, 480p 10, 480p 20, 480p 24, 480p 25, 480p 30, 480p Native gVirt Static Partial 50 Static Partial 100 Static Partial 200 Static Partial 300 Evaluation: dynamic relaxed 1000 800 Dynamic segmented partial 600 works fine on normal cases. 400 200 0 1, 480p 10, 480p 20, 480p 24, 480p 25, 480p 30, 480p Up to 13x of gVirt Native gVirt Dynamic Partial Dynamic Segmented Partial 85% of native 250 500 400 200 300 150 200 100 100 50 0 0 1, 720p 5, 720p 10, 720p 15, 720p 20, 720p 1, 1080p 4, 1080p 5, 1080p 6, 1080p 7, 1080p 10, 1080p Native gVirt Dynamic Partial Dynamic Segmented Partial Native gVirt Dynamic Partial Dynamic Segmented Partial Linux 2D/3D performance Slightly better than gVirt. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 moitor scrolling zoomed - xonotic - -asteroids - lightsmark openarena urbanterror system firefox firefox midori - gnome gVirt gHyvi Windows 2D/3D performance Discrepancy is acceptable. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 SM2.0 HDR/SM3.0 Pass2D #"Heaven fps" Heaven gVirt gHyvi Summary An optimized GPU virtualization scheme based on gVirt. New shadow page table: Adaptive hybrid page table relaxed shadow page table shadowing policies Up to 13x performance improvement. Source code is available https://01.org/igvt-g Q&A Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages36 Page
-
File Size-