Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables
Yaozu Dong MochiXue Xiao Zheng Shanghai Jiao Tong University Jiajun Wang Intel Corporation Zhengwei Qi Haibing Guan GPU Usage
• Gaming (2D/3D graphic) • HD video hardware decoding • High performance computing
High Performance Computing shifts computation-intensive workloads to cloud environment.
Molecular Machine Media dynamics learning transcoding simulations
A new computing paradigm: GPU Cloud gHyvi An optimized GPU virtualization scheme based on gVirt.
GMedia: A media Relaxed Shadow Page Adaptive Hybrid Page transcoding benchmark Table Table Shadowing Policies
Up to 13x performance of gVirt and 85% of native. GPU benchmarks
2D/3D graphic GPGPU
For OpenGL and DirectX commands For CUDA and OpenCL commands Such as 3DMark, PassMark Such as Rodinia [ATC 2013], Parboil GMedia A hardware media transcoding benchmark based on Intel’s MSDK(Media Software Development Kit).
MSDK provides APIs for hardware acceleration. A wrapper invokes media functions form MSDK. Test cases can run with assigned threads and settings. The FPS results reflect the performance. Massive update issue
1000 900 800 700 600 500 Native 452 400 gVirt 300 200 100 33 0 1, 10, 20, 24, 25, 30, NATIVE GVIRT 480p 480p 480p 480p 480p 480p 93% performance degrade.
500 450 250 400 350 200 300 150 250 Native Native 200 gVirt 100 gVirt 150 100 50 50 0 0 1, 720p 5, 720p 10, 15, 20, 1, 4, 5, 6, 7, 10, 720p 720p 720p 1080p 1080p 1080p 1080p 1080p 1080p Shadow page table
Page Directory Table Page Table
PDE PTE Guest System Host memory PDE PTE
Shadow Page Shadow Page Directory Table Table
• 512 * 1024 * 4KB = 2GB • Resource partition • User space isolation Strict shadow page table
Trap-and-emulation
Write protect Page fault Update shadow the page table happens page table
Page table
gVirt
Shadow page table Case profiling
Proportion changes 21.43% →79.45% Breakdown of EPT_VIOLATION
Take “15thread-720p” as an example: • 62% of all VM_exit is due to EPT_VIOLATION • 83% of EPT_VIOLATION is caused by PTE access
62% 83% EPT_VIOLATION PTE_ACCESS Frequency The frequency of update in 10s.
Up to 7.5k times Hot page tables
Hot page tables Continuous updates
Same table Conclusion of massive update issue VM is frequently swapping graphic memory pages. It modifies the entries of page table massively.
Large amount of updates(7.5k in 10s) Updates focus on certain pages (hot pages) Updates are continuous (on the same page)
Modificationslead to busy trap-and-emulations. Eventually, overhead happens. (Up to 95%) gHyvi architecture
Host VM Guest VM Native GFX Driver gVirt Native GFX Page Table Driver Page Table
Strict SPT gHyvi VMM Relaxed SPT
GPU
Hyper call Pass Through Trap GPU programming model
CPU tail
head GPU
The commands fed by CPU won’t take effect until they are fetched by the GPU. Relaxed shadow page table
Remove write protection
Page table
gHyvi
Shadow page table Relaxed shadow page table
Page table
gHyvi
Shadow page table Relaxed shadow page table
Page table
Reconstruct gHyvi
Shadow page table Page table reconstruction
Step1: Take a snapshot of guest page table
Page table
Guest Host
Strict shadow page table Snapshot Page table reconstruction
Step2: Massive update
Page table
Guest Host
Strict shadow page table Snapshot Page table reconstruction
Step3: Compare with snapshot
Page table
Guest Host
Relaxed shadow page table Snapshot Page table reconstruction
Step 4: Reconstruct the different part
Page table
Guest Host
Relaxed shadow page table Snapshot Hybrid page table shadowing
Dynamic Full relaxed Segmented partial relaxed
Dynamic Static partial partial relaxed relaxed
4 Policies for Performance Tuning Static partial relaxed Select 50, 100, 200, 300 hot pages Switch them into relaxed mode Dynamic partial relaxed
• All the pages are in strict mode at first • Switch to relaxed mode once it’s touched
… Dynamic segmented partial relaxed
• All the pages are in strict mode at first • Switch to relaxed mode once it’s touched • Reset the relaxed pages after reconstruction
… Evaluation
4 policies of page table shadowing • Full reconstruction • Static partial reconstruction • Dynamic partial reconstruction • Dynamic segmented partial reconstruction Linux 2D/3D performance
Windows 2D/3D performance Configuration
Hardware CPU 4th Intel Haswell i5 (2.4Ghz) RAM 8GB HDD Seagate 500GB
GPU Intel Processor Graphics With 2GB global graphics memory
Software
Linux VM 64bit Ubuntu 12.04 Windows VM 64bit Windows 7 Xen 4.3 4 VCPUS VM configuration and 2GB system memory Evaluation: full relaxed
900 800 700 The performance of normal 600 500 cases are degraded. 400 300 200 100 0 The performance of issue 1, 480p 10, 480p 20, 480p 24, 480p 25, 480p 30, 480p cases are improved. gVirt Full
450 400 200 350 150 300 250 100 200 50 150 100 0 50 1, 1080p 4, 1080p 5, 1080p 6, 1080p 7, 1080p 10, 0 1080p 1, 720p 5, 720p 10, 720p 15, 720p 20, 720p gVirt Full gVirt Full Evaluation: static partial relaxed
The performance of normal cases becomes worse with more relaxed pages.
The performance of issue cases are improved. The coverage of hot pages affects the performance.
1000 900 800 700 600 500 400 300 200 100 0 1, 480p 10, 480p 20, 480p 24, 480p 25, 480p 30, 480p
Native gVirt Static Partial 50 Static Partial 100 Static Partial 200 Static Partial 300 Evaluation: dynamic relaxed
1000 800 Dynamic segmented partial 600 works fine on normal cases. 400
200
0 1, 480p 10, 480p 20, 480p 24, 480p 25, 480p 30, 480p Up to 13x of gVirt Native gVirt Dynamic Partial Dynamic Segmented Partial 85% of native
250 500
400 200
300 150
200 100
100 50
0 0 1, 720p 5, 720p 10, 720p 15, 720p 20, 720p 1, 1080p 4, 1080p 5, 1080p 6, 1080p 7, 1080p 10, 1080p
Native gVirt Dynamic Partial Dynamic Segmented Partial Native gVirt Dynamic Partial Dynamic Segmented Partial Linux 2D/3D performance
Slightly better than gVirt.
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
moitor scrolling zoomed - xonotic - -asteroids - lightsmark openarena urbanterror system firefox firefox midori -
gnome
gVirt gHyvi Windows 2D/3D performance Discrepancy is acceptable.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 SM2.0 HDR/SM3.0 Pass2D #"Heaven fps" Heaven
gVirt gHyvi Summary
An optimized GPU virtualization scheme based on gVirt.
New shadow page table: Adaptive hybrid page table relaxed shadow page table shadowing policies
Up to 13x performance improvement.
Source code is available https://01.org/igvt-g Q&A
Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables