ROOT and x32-ABI
Nathalie Rauschmayr
On behalf of LHCb collaboration
March 13, 2013
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20 Outline
1 History of application binary interfaces Why is a new binary interface necessary? x32-ABI - A new 32-bit ABI for x86-64 Preconditions
2 How ROOT can profit Results within other CERN applications Results within ROOT-Benchmarks Reasons for performance differences
3 Support of x32-ABI
4 Conclusion
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20 History of application binary interfaces
x86: 32-bit word size → 4GB addressable
x86 64 as an extension of x86: runs 32-bit as well as 64-bit applications 64-bit mode supports new hardware features Number of CPU registers
Floating-point performance
Function parameters passed via registers
Syscall instruction ...
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 2 / 20 Why is a new binary interface necessary?
x86-64 (64-bit mode): new hardware features
no memory limit slowdown due to memory issues contention
page faults
paging ... x86: very low memory footprint
performance issues
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 3 / 20 x32-ABI - A new 32-bit ABI for x86-64
Application Binary Interface based on 64-bit x86 architecture
Reduces size of pointers and C-datatype long to 32-bit
Takes advantage of all x64-features
Avoids memory overhead
advantages of 64-bit instruction set + memory footprint of a 32-bit application
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 4 / 20 x32-ABI - A new 32-bit ABI for x86-64
Developed by H.J. Lu (Intel)
Introduced in Linux-Kernel 3.4 (released in summer 2012)
http://www.linuxplumbersconf.org/2011/ocw/sessions/531 Opinions in Linux-community differ quite a lot 32-bit time values will overflow
adressable memory
...
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 5 / 20 x32-ABI - A new 32-bit ABI for x86-64
Impressive results from x32-developers:
181.mcf from SPEC CPU 2000 (memory bound):
Intel Core i7 ∼ 40% faster than x86-64 ∼ 2% slower than ia32
Intel Atom ∼ 40% faster than x86-64 ∼ 1% faster than ia32
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 6 / 20 x32-ABI - A new 32-bit ABI for x86-64
186.crafty from SPEC CPU 2000 (64bit integer):
Intel Core i7 ∼ 3 % faster than x86-64 ∼ 40% faster than ia32
Intel Atom ∼ 4 % faster than x86-64 ∼ 26% faster than ia32
=⇒ CERN applications will definitely reduce memory footprint and very likely CPU-time
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 7 / 20 Preconditions
x32-ABI requires: Linux Kernel 3.4 compiled with CONFIG X86 X32=y
Gcc 4.7
Binutils 2.22
Glibc 2.16
Recompiling all system libraries, required by an application, with gcc -mx32
ELF 32-bit LSB shared object, x86-64, version 1 (SYSV)
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 8 / 20 How ROOT can profit
CERN applications use millions of pointers GAUDI framework stores all events as pointers
Many virtual functions (virtual tables: function pointers and hidden pointers)
ROOT (histogram as a tree of pointers ...)
...
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 9 / 20 Results within other CERN applications
Memory Time
30 30 25 20 20 10 15
0 10
−10 5 Reduction of time in % Memory reduction in % −20 0
−30 −5
namd dealII soplex povray omnetpp astar xalancbmk namd dealII soplex povray omnetpp astar xalancbmk
Resident Size (32 vs x32) Resident Size (64 vs x32) CPU Time (32 vs x32) CPU Time (64 vs x32)
Figure : Comparison of memory consumption and CPU-time within the HEPSPEC-benchmarks
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 10 / 20 Results within other CERN applications
LHCb Reconstruction with Gaudiv23r3 and ROOT 5.34.00 Reconstruction of 1000 Events: Physical Memory: -20 %
Total elapsed: -2 % reduction
LHCb Analysis with Gaudiv23r3 and ROOT 5.34.00 Analysis of 10000 Events: Physical Memory: -21 %
Total elapsed: 1.5 % increase
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 11 / 20 Results within ROOT-Benchmarks
Following ROOT-benchmarks: stressHistogram
stress 1000 stressHepix IO
linear algebra
vector
sparse matrix
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 12 / 20 Results within ROOT-Benchmarks
Memory Time
30 60 25 50 20 40 15 30 10 20 Reduction of time in % Memory reduction in % 5
10 0
0 Histogram Events Spectrum Linear Fit Histogram Events Spectrum Linear Fit
Resident Size (64 vs x32) Real Time (64 vs x32) CPU Time (64 vs x32)
Figure : Comparison of memory consumption and CPU-time within the ROOT-benchmarks
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 13 / 20 Results within ROOT-Benchmarks
stress 1000:
Memory ·106 100 1
0.8 80
0.6 60
40
Memory in kB 0.4 Memory reduction in % 0.2 20
0 0 0 200 400 600 800 1,000 1,200 Histogram Events Spectrum Linear Fit Snapshots
RSS m64 VSS m64 RSS x32 VSS x32 heap - profiling (64 vs x32) heap - maps (64 vs x32)
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 14 / 20 Reasons for performance differences
Memory seen by system and by process are different Default thresholds of malloc: t = 4 · 1024 · 1024 · sizeof (long)
if x > t: malloc uses mmap
if x < t: malloc uses sbrk mmap: memory blocks can be independently given back
anonymous memory blocks as an extension of the heap sbrk: memory blocks cannot be returned to the operating system
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 15 / 20 Reasons for performance differences
Looking into dealII, omnetpp, ROOT stress: dealII : page faults reduced by 10%
omnetpp: page faults reduced by 38%
stress: difference in CPU-time likely caused by memory issues Further Issues: code and data alignment
x32 loop unrolling needs some tuning
zero-extensions for pointer conversion =⇒ x32-ABI still under development and there are possiblities for optimization
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 16 / 20 Outline
1 History of application binary interfaces Why is a new binary interface necessary? x32-ABI - A new 32-bit ABI for x86-64 Preconditions
2 How ROOT can profit Results within other CERN applications Results within ROOT-Benchmarks Reasons for performance differences
3 Support of x32-ABI
4 Conclusion
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 16 / 20 Support of x32-ABI
Ubuntu: Ongoing work, support planned for 13.04: https://blueprints.launchpad.net/ubuntu/+spec/ foundations-r-x32-planning
Gentoo: Release candidate available since summer 2012: http://www.gentoo.org/news/20120608-x32_abi.xml
Red Hat: No plans according to: http://comments.gmane.org/ gmane.linux.redhat.fedora.devel/164019
LLVM-compiler: supports x32-ABI since summer 2012 http: //www.phoronix.com/scan.php?page=news_item&px=MTI3NTk
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 17 / 20 Outline
1 History of application binary interfaces Why is a new binary interface necessary? x32-ABI - A new 32-bit ABI for x86-64 Preconditions
2 How ROOT can profit Results within other CERN applications Results within ROOT-Benchmarks Reasons for performance differences
3 Support of x32-ABI
4 Conclusion
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 17 / 20 Conclusion
New room for improvement, ( CERN ) applications can profit substantially
Computing Grid limits memory anyway to 4 GB per process
Gain in performance is for FREE
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 18 / 20 Conclusion
Recompilation: Cast from time values to long (...) will produce wrong results (xrootd)
New pointer size required modifications in CINT (function and data pattern) Getting a working environment: does not work out of the box
building gcc fails due to missing glibc x32
building glibc x32 with a partly built gcc
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 19 / 20 Any questions?
Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 20 / 20