and : The New Normal

Presented by: Behan Webster (LLVMLinux project lead)

LLVMLinux project Presentation Date: 2015.02.18 New Clang Benchmarks (compilation)

Timed ImageMagick Compilation v6.8.1-10 Time To Compile

Seconds, Less Is Better OpenBenchmarking.org

GCC 4.8.2 60.35 SE +/- 0.36

GCC 4.9.0 RC1 60.59 SE +/- 0.04

LLVM Clang 3.5 20140413 29.61 SE +/- 0.14

14 28 42 56 70

Powered By 5.0.1 http://www.phoronix.com/scan.php?page=article&item=gcc49_compiler_llvm35&num=1 LLVMLinux project New Clang Benchmarks ()

ebizzy v0.3 Records/s

Seconds, More Is Better OpenBenchmarking.org

GCC 4.8.2 42849 SE +/- 109.85

GCC 4.9.0 RC1 42950 SE +/- 44.08

LLVM Clang 3.5 20140413 43202 SE +/- 53.03

9000 18000 27000 36000 45000

Powered By Phoronix Test Suite 5.0.1 1. (CC) gcc options: -pthread -lpthread -O3 -march=native http://www.phoronix.com/scan.php?page=article&item=gcc49_compiler_llvm35&num=1 LLVMLinux project LLVMLinx Patched Mainline Kernel Tree

● A mainline kernel tree with all LLVMLinux patches applied on top is now available: – git://git.linuxfoundation.org/llvmlinux/kernel.git ● Dated llvmlinux branches – remotes/origin/llvmlinux-2015.01.18 ● The master branch is rebased regularly

LLVMLinux project LLVMLinux Project Status

● LLVM/clang: – All LLVMLinux patches for LLVM are Upstream – Newer LLVM patches to support the are mostly being added by upstream maintainers – Bug 4068 - [META] Compiling the Linux kernel with clang – 8 LLVM open Bugs to compile the kernel

LLVMLinux project LLVMLinux Project Status

● Linux Kernel: – Roughly 40 kernel patches for 4 architectures ● , arm, arm64/, mips – And ppc support independently upstreamed – LLVMLinux branch in linux-next

LLVMLinux project Kernel Patches

Architecture Number of patches Submitted No arch 21 9 aarch64 3 0 arm 6 2 mips 6 0 x86_64 4 0 TOTALS 40 11

LLVMLinux project Building the kernel with clang

● Install clang (from distro, source, or .org) – http://llvm.org/releases/download.html ● Get the code: mainline or patched kernel git clone git://git.linuxfoundation.org/llvmlinux/kernel.git ● Native build make CC=clang ● Cross build with clang

make CC=clang ARCH=arm CROSS_COMPILE=arm-linux.gnueabi- LLVMLinux project Up to date prebuilt binaries

● Up to date 3.6 prebuilt binaries available for: – from sid/main – Fedora from llvm.org – OpenSUSE from llvm.org – Ubuntu from ppa:xorg-edgers/ppa or llvm.org http://llvm.org/releases/download.html

LLVMLinux project Clang vs gcc

● Clang generates code which is roughly the same size and speed as gcc ● Clang is improving at a tremendous rate however... ● Competition from clang has meant that gcc has improved a lot recently (error reporting, etc) ● So why use clang?

LLVMLinux project Advantages to using more than one CC

● The standard is long, can difficult to interpret, and doesn't define everything → ● This means that every C is a little different

LLVMLinux project Advantages to using more than one CC

● Most try their code with one compiler only ● If it works, most people move on ● This means that assumptions about what is valid C code can inadvertently lead to code which uses undefined behavior

LLVMLinux project Advantages to using more than one CC

● Using more than one interpretation/implementation of the C standard can catch issues before they become a problem ● The Linux kernel has also traditionally used a lot of gnu extensions which mostly aren't strictly required anymore with newer C versions

LLVMLinux project More than one CC FTW

● More than one CC means code is more portable ● More portable often means more standards compliant ● More standards compliant often means more correct

LLVMLinux project

LLVMLinux project Debian recompiled with clang

Clang Packages Failures Percentage version built

3.3 18854 2188 11.6% 3.4 21204 2221 10.5% 3.4.2 21383 2040 9.5% 3.5.0 22202 1261 5.7% 3.6.0rc1 22202 1307 5.9%

LLVMLinux project Unpublicized success

● Recently a large company was able to find a significant error in their private kernel which had apparently been missed by gcc ● Over 2 weeks of work had been put into finding the error which clang found within minutes ● I hear now that their engineers compile their kernel code with both

LLVMLinux project Toolization

● Both libllvm and libclang allow the same technology in clang to be included into other tools ● Gcc is explicitly not toolized in order to make it difficult to comingle it with non-copylefted code

LLVMLinux project What can be built with libclang?

● Most static analyzers implement their own parser/grammar which means yet-another-C-standard- interpretation ● Clang Static Analyzer uses libclang which means having full access to the parse tree AST ● This means that it sees what the compiler can see

LLVMLinux project Integrated Assembly Integration

● The Assembler is used, in part, to inline assembly into C code ● In these situations there are a lot of parameters instructing the compiler on how the ASM should be included

● gcc uses an external assembler and leaves much of the validation of the ASM to gnu as

● Clang can also use gnu as, but also has the option of using the Integrated Assembler which is faster and more integrated into the compiler for tighter code checking and better error messages

LLVMLinux project Integrated Assembly Integration (2)

● Gcc doesn't validate ASM, but clang does (because of IA) ● There are places in the kernel code where ASM not being validated is taken advantage of to export information from CC (the generation of bounds.s)

● This code breaks in clang since what is being exported are macros and not valid ASM ->NR_PAGEFLAGS $23 __NR_PAGEFLAGS ->MAX_NR_ZONES $4 __MAX_NR_ZONES ● In this case clang catches the inappropriate use of ASM

LLVMLinux project extern inline: Different for gnu89 and gnu99/C11

● GNU89/GNU90 (used by gcc, gcc5 is C11) – Function will be inlined where it is used – No function definition is emitted – A non-inlined function may also be provided ● GNU99/ (used by clang, clang-3.6 is C11) – Function will be inlined where it is used – An external function is emitted – No other function of the same name may be provided. ● Solution? Use “static inline” instead.

LLVMLinux project Other things which affect inlining

● Both gcc5 and clang-3.6 are C11 by default ● However a number of kernel developers have found issues with code which makes the kernel not yet C11 compliant

● The changes to the kernel code for clang pushed for at least C99 compliance with the idea that C11 isn't far off

● As of Linux v3.18 -std=gnu89 is now being passed to CC to force the old behavior

● Linus insists that this should only be a temporary workaround

LLVMLinux project Attribute Order

● gcc is less picky about placement of __attribute__(()) ● clang requires it at the end of the type or variable ● This arguably ends up being a lot more readable

-struct __read_mostly va_alignment va_align = { +struct va_alignment __read_mostly va_align = {

LLVMLinux project Named Registers

● Global named registers now fixed for x86, arm, aarch64, mips register unsigned long current_stack_pointer asm ("sp");

LLVMLinux project Variable Length Arrays In Structs

● VLAIS isn't supported by Clang (now documented gcc extension) char vla[n]; /* Supported, C99/C11 */ struct { char flexible_member[]; /* Supported, C99/C11 */ } struct_with_flexible_member; struct { char vlais[n]; /* Explicitly not allowed by C99/C11 */ } variable_length_array_in_struct;

● VLAIS is used in the Linux kernel in a number of places, spreading mostly through reusing patterns from data structures found in crypto (fixes for which have now been merged)

LLVMLinux project VLAIS Removal Example

- struct { - struct shash_desc shash; - char ctx[crypto_shash_descsize(hash)]; - } desc; + SHASH_DESC_ON_STACK(shash, hash);

- desc.shash.tfm = hash; + shash->tfm = hash; (from crypto/hmac.c)

LLVMLinux project Status of VLAIS in the Linux Kernel

● USB Gadget patch is in mainline ● Netfilter patch is in mainline ● Crypto patches are upstream (still a few cleanup patches) ● Only the following VLAIS patches are left: (exofs, md-raid10, nfs, wimax-i2400m)

LLVMLinux project Nested Functions

● The kernel community generally don't like the use of nested functions ● However they kept being added to the kernel ● About half a dozen uses of nested functions have been removed. ● Clang keeps finding more of them ● The latest one is in drivers/md/bcache/sysfs.c

LLVMLinux project How Can You Help?

● Make it known you want to be able to use Clang to compile the kernel ● Test the LLVMLinux patches

● Report bugs to the mailing list

● Help get clang related patches upstream

● Work on unsupported features and Bugs

● Submit new targets and arch support

● Patches welcome

LLVMLinux project Embrace the Dragon. He's cuddly.

Thank you

http://llvm.linuxfoundation.org

LLVMLinux project Contribute to the LLVMLinux Project

● Project wiki page – http://llvm.linuxfoundation.org ● Project Mailing List – http://lists.linuxfoundation.org/mailman/listinfo/llvmlinux – http://lists.linuxfoundation.org/pipermail/llvmlinux/ ● IRC Channel – #llvmlinux on OFTC – http://buildbot.llvm.linuxfoundation.org/irclogs/OFTC/%23llvmlinux/

● LLVMLinux Community on Plus LLVMLinux project