LLVM in the Freebsd Toolchain
Total Page:16
File Type:pdf, Size:1020Kb
LLVM in the FreeBSD Toolchain David Chisnall 1 Introduction vacky and Pawel Worach begin trying to build FreeBSD with Clang and quickly got a working FreeBSD 10 shipped with Clang, based on kernel, as long as optimisations were disabled. By LLVM [5], as the system compiler for x86 and May 2011, Clang was able to build the entire base ARMv6+ platforms. This was the first FreeBSD system on both 32-bit and 64-bit x86 and so be- release not to include the GNU compiler since the came a viable migration target. A large number of project's beginning. Although the replacement of LLVM Clang bugs were found and fixed as a result the C compiler is the most obvious user-visible of FreeBSD testing the compilation of a large body change, the inclusion of LLVM provides opportu- of code. nities for other improvements. 3 Rebuilding the C++ stack 2 Rationale for migration The compiler itself was not the only thing that The most obvious incentive for the FreeBSD project the FreeBSD project adopted from GCC. The en- to switch from GCC to Clang was the decision by tire C++ stack was developed as part of the GCC the Free Software Foundation to switch the license project and underwent the same license switch. of GCC to version 3 of the GPL. This license is This stack comprised the C++ compiler (g++), the unacceptable to a number of large FreeBSD con- C++ language runtime (libsupc++) and the C++ sumers. Given this constraint, the project had a Standard Template Library (STL) implementation choice of either maintaining a fork of GCC 4.2.1 (libstdc++). (the last GPLv2 release), staying with GCC 4.2.1 All of these components required upgrading to forever, or switching to another compiler. The first support the new C++11 standard. The runtime option might have been feasible if other GCC users library, for example, required support for depen- had desired the same and the cost could have been dent exceptions, where an exception can be boxed shared. The second was an adequate stopgap, but and rethrown in another thread (or the same thread the release of the C11 and C++11 specifications— later). both unsupported by GCC 4.2.1|made this an im- The FreeBSD and NetBSD Foundations jointly possible approach for the longer term. The remain- paid PathScale to open source their C++ runtime ing alternative, to find a different compiler to re- library (libcxxrt), which was then integrated into place GCC, was the only viable option. the FreeBSD base system, replacing libsupc++. The OpenBSD project had previously investi- The LLVM project provided an STL implementa- gated PCC, which performed an adequate job with tion (libc++), with full C++11 and now C++14 C code (although generating less optimised code support, which was duly integrated. than even our old GCC), but had no support for Using libcxxrt under libstdc++ allowed C++ C++. The TENDRA compiler had also been con- libraries that exposed C interfaces, or C++ inter- sidered, but development had largely stopped by faces that didn't use STL types, to be mixed in the 2007. same binary as those that used libc++. This in- The remaining alternative was Clang, which was cludes throwing exceptions between such libraries. still a very young compiler in 2008, but had some Implementing this in a backwards-compatible significant commercial backing from companies in- way required some linker tricks. Tradition- cluding Apple and Google. In 2009, Roman Di- ally, libsupc++ had been statically linked into libstdc++, so from the perspective of all linked to be switched to using Clang once the programs the libsupc++ symbols appeared to come from libstdc++. In later versions in the 9.x se- ries, and in the 9-COMPAT libraries shipped for 4.2 The default dialect 10, became a filter library, dynamically libstdc++ One of the simplest, but most common, things to fix linked to . This allows symbol resolu- libsupc++ was the assumption by a lot of ports that they could tion to work correctly and allows libsupc++ or invoke the cc, program and get a C89 compiler. libcxxrt to be used as the filtee, which actually POSIX97 deprecated the cc utility, because it ac- provides the implementation of these symbols. cepts an unspecified dialect of C, which at the time might have been K&R or C89. Over a decade later, 4 Problems with ports some code is still trying to use it. Today, it may require K&R C (very rare), C89 (very common), The FreeBSD ports tree is a collection of infrastruc- C99 (less common), or C11 (not yet common), and ture for building around 24,000 third-party pro- so should be explicitly specifying a dialect. This grams and libraries. Most ports are very thin was a problem, because gcc, when invoked as cc wrappers around the upstream distribution's build defaults to C89, whereas clang defaulted to C99 system, running autoconf or CMake configurations and now to C11. and then building the resulting make files or equiv- This is not usually an issue, as the new versions of alent. For well-written programs, the switch to the C standard are intended to be backwards com- Clang was painless. Unfortunately, well-written patible. Unfortunately, although valid C89 code is programs make up the minority of the ports tree. usually valid C99 or C11 code, very little code is ac- To get the ports tree working with Clang required tually written in C89. Most C ports are written in a number of bug fixes. C plus GNU extensions. In particular, C99 intro- duced the inline keyword, with a different meaning to the inline keyword available as a GNU extension 4.1 Give up, use GCC to C89. This change causes linker failures when C89 code with GNU-flavoured inline functions is The first stopgap measure was to add a flag to the compiled as C99. For most ports, this was fixed by ports tree allowing ports to select that they require adding -fgnu89-inline to the port's CFLAGS. GCC. At the coarsest granularity is the USE GCC flag knob, which allows a port to specify that it re- quires either a specific version of GCC, or a specific 4.3 C++ templates minimum version. This is a better-than-nothing approach to get- Another common issue in C++ code relates to two- ting ports building again, but is not ideal. There is phase lookup in C++ templates. This is a par- little advantage in switching to a new base system ticularly tricky part of the C++ stack and both compiler if we are then going to use a different one GCC and Microsoft's C++ compiler implemented for a large number of ports. We also encounter it in different, mutually incompatible, wrong ways. problems due to GCC's current inability to use Clang implements it correctly, as do new versions libc++, meaning that it is hard to compile C++ of other compilers. Unlike other compilers, Clang ports with GCC if they depend on libraries that are does not provide a fallback mode, accepting code built with Clang, and vice versa. Currently around with GNU or Microsoft-compatible errors. 1% of the ports tree requires this. Quite a few more The most common manifestation of this differ- use the flags exposed in the compiler namespace ence is template instantiations failing with an un- for the port's USES flags. In particular, specify- known identifier error. Often these can be fixed ing USES=compiler:openmp will currently force a by simply specifying this−> in front of the vari- port to use GCC, as our Clang does not yet include able named in the error message. In some more OpenMP support. complex programs, working out exactly what was This framework allows ports to specify the exact intended is a problem and so fixing it is impossible features of GCC that they require, allowing them for the port maintainer. This is currently the largest cause of programs re- behaviour. This is particularly prevalent in con- quiring GCC. In particular, some big C++ projects figure scripts. For example, Mono checks whether such as the Sphinx speech recognition engine have isnan(1) works, which checks whether there is a ver- not had new releases for over five years and so are sion of isnan() that accepts an integer argument. unlikely to be fixed upstream. Several of these If it doesn't find one, then it provides an imple- ports will only build with specific version of GCC mentation of isnan() that accepts a double as the as well and so are still built with GCC in the ports argument, which causes linker failures. tree. Fortunately, many these (for example, some Fixing these was relatively easy, but time con- of the KDE libraries) are now tested upstream with suming. Most of the errors were in configure Clang for Mac OS X compatibility and so simply scripts, but we did find a small number of real bugs updating the port to a newer version fixed incom- in code. patibilities. 4.6 OpenMP 4.4 Goodbye tr1 One of the current limitations of Clang as a C/C++ C++ Technical Report 1 (TR1) is a set of experi- compiler is its lack of OpenMP support. OpenMP mental additions to C++ that were standardised in is a pragma-based standard for compiler-assisted between C++03 and C++11. It provided a number parallelism and so is increasingly important in an of extensions that were in headers in the tr1/ di- era when even mobile devices have multiple cores.