ROOT and x32-ABI

Nathalie Rauschmayr

On behalf of LHCb collaboration

March 13, 2013

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20 Outline

1 History of application binary interfaces Why is a new binary interface necessary? x32-ABI - A new 32-bit ABI for x86-64 Preconditions

2 How ROOT can profit Results within other CERN applications Results within ROOT-Benchmarks Reasons for performance differences

3 Support of x32-ABI

4 Conclusion

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20 History of application binary interfaces

x86: 32-bit word size → 4GB addressable

x86 64 as an extension of x86: runs 32-bit as well as 64-bit applications 64-bit mode supports new hardware features Number of CPU registers

Floating-point performance

Function parameters passed via registers

Syscall instruction ...

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 2 / 20 Why is a new binary interface necessary?

x86-64 (64-bit mode): new hardware features

no memory limit slowdown due to memory issues contention

page faults

paging ... x86: very low memory footprint

performance issues

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 3 / 20 x32-ABI - A new 32-bit ABI for x86-64

Application Binary Interface based on 64-bit x86 architecture

Reduces size of pointers and C-datatype long to 32-bit

Takes advantage of all x64-features

Avoids memory overhead

advantages of 64-bit instruction set + memory footprint of a 32-bit application

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 4 / 20 x32-ABI - A new 32-bit ABI for x86-64

Developed by H.J. Lu (Intel)

Introduced in -Kernel 3.4 (released in summer 2012)

http://www.linuxplumbersconf.org/2011/ocw/sessions/531 Opinions in Linux-community differ quite a lot 32-bit time values will overflow

adressable memory

...

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 5 / 20 x32-ABI - A new 32-bit ABI for x86-64

Impressive results from x32-developers:

181.mcf from SPEC CPU 2000 (memory bound):

Intel Core i7 ∼ 40% faster than x86-64 ∼ 2% slower than ia32

Intel Atom ∼ 40% faster than x86-64 ∼ 1% faster than ia32

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 6 / 20 x32-ABI - A new 32-bit ABI for x86-64

186.crafty from SPEC CPU 2000 (64bit integer):

Intel Core i7 ∼ 3 % faster than x86-64 ∼ 40% faster than ia32

Intel Atom ∼ 4 % faster than x86-64 ∼ 26% faster than ia32

=⇒ CERN applications will definitely reduce memory footprint and very likely CPU-time

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 7 / 20 Preconditions

x32-ABI requires: 3.4 compiled with CONFIG X86 X32=y

Gcc 4.7

Binutils 2.22

Glibc 2.16

Recompiling all system libraries, required by an application, with gcc -mx32

ELF 32-bit LSB shared object, x86-64, version 1 (SYSV)

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 8 / 20 How ROOT can profit

CERN applications use millions of pointers GAUDI framework stores all events as pointers

Many virtual functions (virtual tables: function pointers and hidden pointers)

ROOT (histogram as a tree of pointers ...)

...

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 9 / 20 Results within other CERN applications

Memory Time

30 30 25 20 20 10 15

0 10

−10 5 Reduction of time in % Memory reduction in % −20 0

−30 −5

namd dealII soplex povray omnetpp astar xalancbmk namd dealII soplex povray omnetpp astar xalancbmk

Resident Size (32 vs x32) Resident Size (64 vs x32) CPU Time (32 vs x32) CPU Time (64 vs x32)

Figure : Comparison of memory consumption and CPU-time within the HEPSPEC-benchmarks

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 10 / 20 Results within other CERN applications

LHCb Reconstruction with Gaudiv23r3 and ROOT 5.34.00 Reconstruction of 1000 Events: Physical Memory: -20 %

Total elapsed: -2 % reduction

LHCb Analysis with Gaudiv23r3 and ROOT 5.34.00 Analysis of 10000 Events: Physical Memory: -21 %

Total elapsed: 1.5 % increase

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 11 / 20 Results within ROOT-Benchmarks

Following ROOT-benchmarks: stressHistogram

stress 1000 stressHepix IO

linear algebra

vector

sparse matrix

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 12 / 20 Results within ROOT-Benchmarks

Memory Time

30 60 25 50 20 40 15 30 10 20 Reduction of time in % Memory reduction in % 5

10 0

0 Histogram Events Spectrum Linear Fit Histogram Events Spectrum Linear Fit

Resident Size (64 vs x32) Real Time (64 vs x32) CPU Time (64 vs x32)

Figure : Comparison of memory consumption and CPU-time within the ROOT-benchmarks

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 13 / 20 Results within ROOT-Benchmarks

stress 1000:

Memory ·106 100 1

0.8 80

0.6 60

40

Memory in kB 0.4 Memory reduction in % 0.2 20

0 0 0 200 400 600 800 1,000 1,200 Histogram Events Spectrum Linear Fit Snapshots

RSS m64 VSS m64 RSS x32 VSS x32 heap - profiling (64 vs x32) heap - maps (64 vs x32)

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 14 / 20 Reasons for performance differences

Memory seen by system and by process are different Default thresholds of malloc: t = 4 · 1024 · 1024 · sizeof (long)

if x > t: malloc uses mmap

if x < t: malloc uses sbrk mmap: memory blocks can be independently given back

anonymous memory blocks as an extension of the heap sbrk: memory blocks cannot be returned to the operating system

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 15 / 20 Reasons for performance differences

Looking into dealII, omnetpp, ROOT stress: dealII : page faults reduced by 10%

omnetpp: page faults reduced by 38%

stress: difference in CPU-time likely caused by memory issues Further Issues: code and data alignment

x32 loop unrolling needs some tuning

zero-extensions for pointer conversion =⇒ x32-ABI still under development and there are possiblities for optimization

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 16 / 20 Outline

1 History of application binary interfaces Why is a new binary interface necessary? x32-ABI - A new 32-bit ABI for x86-64 Preconditions

2 How ROOT can profit Results within other CERN applications Results within ROOT-Benchmarks Reasons for performance differences

3 Support of x32-ABI

4 Conclusion

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 16 / 20 Support of x32-ABI

Ubuntu: Ongoing work, support planned for 13.04: https://blueprints.launchpad.net/ubuntu/+spec/ foundations-r-x32-planning

Gentoo: Release candidate available since summer 2012: http://www.gentoo.org/news/20120608-x32_abi.xml

Red Hat: No plans according to: http://comments.gmane.org/ gmane.linux.redhat.fedora.devel/164019

LLVM-compiler: supports x32-ABI since summer 2012 http: //www.phoronix.com/scan.php?page=news_item&px=MTI3NTk

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 17 / 20 Outline

1 History of application binary interfaces Why is a new binary interface necessary? x32-ABI - A new 32-bit ABI for x86-64 Preconditions

2 How ROOT can profit Results within other CERN applications Results within ROOT-Benchmarks Reasons for performance differences

3 Support of x32-ABI

4 Conclusion

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 17 / 20 Conclusion

New room for improvement, ( CERN ) applications can profit substantially

Computing Grid limits memory anyway to 4 GB per process

Gain in performance is for FREE

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 18 / 20 Conclusion

Recompilation: Cast from time values to long (...) will produce wrong results (xrootd)

New pointer size required modifications in CINT (function and data pattern) Getting a working environment: does not work out of the box

building gcc fails due to missing glibc x32

building glibc x32 with a partly built gcc

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 19 / 20 Any questions?

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 20 / 20