SEE TEXT ONLY

FreeBSD IN SCIENTIFIC COMPUTING

By Jason Bacon I’ve been running both FreeBSD and uninterrupted since the mid-1990s. I’ve run many different Linux distributions over the years, most recently focusing on CentOS. I also have significant experience with Mac OS X and NetBSD and have experimented with many other BSD platforms. As a staunch agnostic with a firm belief in the value of open standards, I like to remain familiar with all the options in the POSIX world so I’m always prepared to choose the best tool for the job.

ver the years, I’ve watched barriers to rologists, neuropsychologists, cell biologists, psychia- FreeBSD use fall one by one. Open-source trists, and biophysicists. During most of this time, I Osoftware continues to spread like the Blob, was the sole IT support person for several labs, filling in niches once held exclusively by commercial peaking at over 60 researchers. I was responsible for and other closed-source . maintaining many workstations as well as OpenOffice/LibreOffice, OpenJDK, Clang, Flang, managing all the research software needed for fMRI and many other high-quality open-source products analysis. The need for a full-featured and extremely have made it possible for most people to do every- reliable became very clear, very thing they need on FreeBSD with ease. The few quickly. FreeBSD answered that call and made it remaining limitations of what can be run on possible for me to single-handedly keep this impor- FreeBSD have become increasingly esoteric and tant research moving forward for many years. they, too, seem destined to disappear in time. Since 2009, I have been supporting a wide range My need to run MS Windows for personal use of research, including engineering, bioinformatics, came to an end more than a decade ago, although physics, math, chemistry, public health, business, I still support it to some extent for work. Mac OS X and psychology. A major part of this environment has filled the few remaining needs that had been has been our HPC clusters running CentOS Linux. served by Windows, but even my Mac has now FreeBSD has also played an important role as a been reduced to a once-per-year platform for run- development and testing platform, and the primary ning tax software. I could, of course, use FreeBSD OS on our educational HPC cluster and HTCondor for this as well with a web-based tax program, if grid. It has also provided the model for how we not for the reality of in the manage most open-source software on our CentOS cloud. The bottom line is that I now find FreeBSD systems, using , a cross-platform package easier and more pleasant to use than commercial manager from the NetBSD project, originally derived operating systems for most of my work and per- from FreeBSD ports. sonal computing. Most of my professional work since late 1999 Already an Important Part of HPC has been in support of scientific computing. From FreeBSD already plays crucial roles in research, 1999 until 2008, I supported fMRI brain-mapping including many high-performance computing research for a multidisciplinary group including neu- (HPC) environments, although some may not rec-

4 FreeBSD Journal ognize it by name. FreeBSD is the foundation of staff to be as productive as possible, by avoiding sys- some of the most popular high-performance storage tem outages and performing common tasks such as appliances such as FreeNAS, Isilon, NetApp, and software deployment as quickly and cleanly as possi- Panasas. FreeBSD is also an important component of ble. This is where FreeBSD stands out among its Juniper network switches, pfSense firewalls, Apple’s peers. FreeBSD’s unsurpassed reliability and ease of OS X, and runs servers for many web-hosting and management save precious IT man-hours that could cloud services. FreeBSD enjoys strong support from be used for more creative endeavors. It can literally other vendors as well, including Mellanox, which is reduce the need for man-hours by an order of mag- currently developing FreeBSD drivers for their net- nitude over the methods often used by inexperi- work adapters. enced sysadmins in the research community. Outside scientific research, FreeBSD is used by Experienced systems managers in this situation some of the biggest players in the business, who know to stay away from bleeding-edge operating need to maximize performance and reliability. Ever systems, which are likely to bring repeated surprises watch a movie on Netflix? If so, you’re a FreeBSD that will distract them from more creative work. For user. The Netflix servers streaming movies to your TV this reason, the vast majority of large HPC clusters or computer run FreeBSD. FreeBSD has also been run Enterprise Linux rather than other distributions used for Yahoo! servers for decades. For a current with the latest kernel, , and other system list of notable companies using FreeBSD, see the software. This is not a criticism of bleeding-edge FreeBSD Handbook (https://www.freebsd.org/ platforms. In fact, it’s important to all of us that doc/en_US.ISO8859-1/books/handbook/ many typical users use them and work the bugs out nutshell.html#introduction-nutshell-users). of the latest new kernel features, compilers, etc. Today’s bleeding-edge is tomorrow’s Enterprise. FreeBSD as a Enterprise Linux systems achieve a high level of Scientific computing has reached a sort of utopia in reliability and long-term binary compatibility by tak- recent years due to fast, cheap hardware and full- ing a snapshot of a recent bleeding-edge system and featured, . For a typical workstation or declaring a moratorium on major upgrades. For laptop, it’s hard for a scientist to go wrong. There example, Redhat Enterprise and CentOS are based are multiple free operating systems to choose from, on a snapshot of Fedora. The down side of this mostly based on BSD, Linux, and Solaris. Any one of approach is that the tools and libraries that ship with them will serve the needs of a typical scientist quite it are outdated and unable to support the latest open- well. There are POSIX compatibility layers for MS source scientific software. For example, Redhat Windows such as and Windows Subsystem Enterprise 7, the latest release as of this writing, ships for Linux (WSL), which facilitate running Unix soft- with GCC 4.8.5 (June 2015) and GNU libc 2.17. The ware directly on Windows, and free virtual machines, latest releases are GCC 8.1 and libc 2.27. The only such as VirtualBox and qemu, that allow us to run way to build the latest open source on such a system multiple operating systems at the same time on the is by installing newer core tools and libraries. same machine. Replacing them outright would sacrifice the stability and binary compatibility with commercial software for which Enterprise Linux is designed. The solution we Unsurpassed Reliability employ on our CentOS systems is to leave all these But what about those of us who manage many core tools in-place and use pkgsrc, a cross-platform multi-user servers for research? There has always , to install newer tools alongside been, and always will be, a severe shortage of skilled them. Many others have resorted to using containers systems managers. Nowhere is this shortage felt or virtual machines to provide a more modern envi- more painfully than in scientific research. The law of ronment, isolated from the outdated Enterprise base. supply and demand dictates that experienced sys- FreeBSD allows us to avoid these issues by offer- tems managers are expensive. Scientific researchers ing stability matching or exceeding that of Enterprise their living by repeatedly begging for grant Linux while providing more modern tools in the money to keep their labs running. With few excep- base. As of this writing, FreeBSD 11.2 and the tions, most researchers have too little funding and upcoming FreeBSD 12 release both provide clang 6 cannot hope to compete with wealthy corporations (March, 2018) as a base and easily support for the limited IT talent pool. In fact, most principle most current open-source scientific software. investigators in academic research earn far less than an experienced Unix sysadmin. Minimize Management Time In this scenario, choosing a system that minimizes A FreeBSD system can be installed from scratch IT man-hours is crucial. We must enable a small IT in about 5 minutes and fully configured for many

July/August 2018 5 purposes in less than an hour, thanks to well- cleaned up so that it’s more portable and easier to designed tools in the base system and the FreeBSD deploy. However, given the constant, rapid progress ports system for managing add-on packages (more of science and the rotating door of student-develop- on this later). The FreeBSD handbook is a well-writ- ers, the research computing community is basically ten and excellent tutorial for most common tasks doomed forever to a world full of nascent, disorgan- and is kept fairly up-to-date. No need to search the ized code. web and risk following outdated or erroneous The FreeBSD ports system can alleviate this prob- instructions. lem in two ways: Ongoing maintenance is quick and easy using FreeBSD ports has one of the largest collections of -update, a binary update system for installing existing packages of any package manager. At the bug fixes and security patches in the base system. time of this writing, FreeBSD users can install any of Unlike many other operating systems, FreeBSD secu- more than 32,000 packages with one simple com- rity notices include clear instructions on how to apply mand. If the package you need is not already in the patches, including when a reboot or service restart is collection, chances are that most or all the dependen- required. cies are there, so the effort needed to create a new FreeBSD port is often a small fraction of that required Software Management with to do a caveman install. Note also that FreeBSD's port FreeBSD Ports options substantially increase the number of possible To go from idea to a published paper in scientific software installations. This has to be considered when computing involves four steps: comparing the ~32,000 FreeBSD ports against the number of binary packages in systems that do not 1. Develop the software support convenient builds from source. Many of the 2. Deploy the software binary packages in such systems are merely different 3. Learn the software builds of the same software. 4. Run the software Imagine a scenario where instead of thousands of scientists each wasting forty hours struggling with Steps 1 and 3 are mostly platform-independent. the same caveman installation, one of them creates a Unfortunately, step 2, software deployment, is often FreeBSD port and everyone else in the world from one of the major bottlenecks in scientific computing. that day on can install the software in seconds. Step 4, running the software, is a lesser, but real bot- That’s many thousands of man-hours redirected from tleneck requiring the ability to deploy an optimized senseless, duplicated IT effort to productive scientific build of the software. FreeBSD ports has the poten- exploration. This is the potential of FreeBSD ports tial to easily eliminate these bottlenecks, as it allows and other package managers. the user to easily install from a huge collection of There is another important difference between binary packages and just as easily install any package binary package managers, which quickly install pre- from source with additional compiler options. compiled packages along with dependencies, and FreeBSD also has strong support for OpenMP, package managers that make it convenient to build pthreads, and MPI, for cases where parallel comput- from source, such as FreeBSD ports, Gentoo , ing is needed to reduce run times. MacPorts, and pkgsrc. Binary packages install much The vast majority of scientific software is open faster, of course, but they may suffer from compati- source, developed on a variety of (usually bleeding- bility, security, and performance problems. To be edge) platforms, and often deployed via very primi- portable, they must be compiled with static libraries tive methods. Many developers don't target package and limited to common CPU features, which pre- managers at all, but instead provide precompiled cludes utilizing new instructions and other CPU fea- binaries for their own development platform and tures that may significantly improve performance. In maybe a few others, alongside cryptic instructions extreme cases, you may see a 30% improvement in for performing a “caveman” install, manually build- speed from a non-portable binary utilizing all avail- ing from source after installing dependencies, or able CPU features. This can save thousands of core- worse, using their bundled dependency software. hours on an HPC cluster running a large analysis. The chances of their build-from-source instructions With FreeBSD ports, building an optimized binary working for the average user are almost nil. from the , utilizing the best features of Part of the problem is that the developers are your CPU, is as easy as installing the binary package. mostly scientists with little or no computer science It will take longer for the computer to build and training, very often self-taught graduate students install, of course, but the effort for you is about the developing software for their dissertation. They don’t same. To install a portable (possibly slow) binary know much about sustainable development and sys- package, we might use the following: tems management practices. If their software lives beyond their graduation date, it will likely get pkg install canu 6 FreeBSD Journal To build an optimized version from source, we would adjust our build settings in /etc/make.conf pkg upgrade (e.g. by adding CXXFLAGS+=-march=native), and (Note that this may replace your optimized from- run the following: source install with a newer binary package, so watch for this and rebuild the port from source cd /usr/ports/biology/canu after the pkg upgrade if necessary.) make install If a FreeBSD port does not already exist for the The port frameworks used to build from source software you need, consider creating one. It’s not as can also be easily updated using or svn. difficult as one might assume. There is a rather steep This is a great feature for those who want to keep learning curve to becoming a FreeBSD ports commit- their systems running the latest of everything. But ter, who can add ports to the system only after what if you need to keep the same version of a pro- extensive quality control measures. gram running through a long-term study spanning However, ports need not be committed before several months or even years? Running pkg upgrade you can use the ports system to deploy them. In fact, could effectively break your study. all ports are deployed and tested before being com- It is possible, though not well-tested at this stage, to mitted. The learning curve for creating a basically deploy multiple ports trees under different prefixes. functional port or upgrading an existing port is fairly Ports can also be installed this way without root privi- small. Anyone who knows how to write a Makefile leges. The FreeBSD ports project branches snapshots and use the build system employed by the upstream every three months under different prefixes. The ports developers can learn to do this fairly quickly. The in these quarterly snapshots are never upgraded, Porter’s Handbook (https://www.freebsd.org/doc/ although they may receive bug and security patches. en_US.ISO8859-1/books/porters-handbook/) covers I have been experimenting with deploying quarterly most of what you would need to know, starting snapshots under prefixes such as /sharedapps/ from the beginner level all the way to becoming a ports-2018Q1, with corresponding installation to FreeBSD committer. /sharedapps/local-2018Q1. Software can be If you install binary packages from the FreeBSD installed statically here and never upgraded, while ports system, they can quickly and easily be updated other software installed to the standard prefix is using the following command: upgraded regularly with pkg upgrade.

Part of the NetActuate family.

Leverage Our Global Footprint of 32 Locations Worldwide

Deploy FreeBSD instantly via our web portal Cloud servers and colocation in all major global markets Customized, dedicated bare metal available Full root access 24x7 friendly support from FreeBSD experts on staff Request a quote and learn more today at netactuate.com

July/August 2018 7 This system follows a better-supported feature of the FreeBSD ports system provides tools for easily pkgsrc package manager, which is designed to be installing RPMs used by the RHEL/CentOS pack- bootstrapped on any POSIX platform under any prefix age manager. Creating a FreeBSD port that installs the user desires. We have been using pkgsrc this way software from the CentOS Yum repositories is trivial on our CentOS systems for years. in most cases, and many such ports you might need We could also use pkgsrc this way on FreeBSD, already exist. but there is a strong motivation to use the FreeBSD FreeBSD’s Linux compatibility is fairly robust and ports in a similar fashion, mainly because it has a capable of running the vast majority of commercial larger collection than pkgsrc at this time. Some work Linux binaries. I have personally run many versions remains to be done to utilize FreeBSD ports this way, of Linux Matlab on FreeBSD machines, with full but the basic support for this sort of deployment functionality, including the Java desktop and MEX already exists. We just need to put it into use and compilation system (using Linux GCC compilers). iron out the wrinkles. At this point, though, I would advise most FreeBSD users to run Octave, a full-featured, open- Base Features Beneficial to Science source Matlab-compatible suite. It’s virtually identical There is a strong interest in the advanced ZFS filesys- to Matlab for most typical users, it’s free, and can be tem in the scientific community. Its performance, flexi- installed in seconds. bility, and data protection features are very attractive There is, of course, some additional work required to users who invest immense amounts of time and to run Linux binaries on FreeBSD. Running some- money generating files containing their research thing as complex as Matlab or ANSYS may require a results. fair amount of effort. Simpler applications such as FreeBSD’s RootOnZFS features allow us to deploy shared-memory LS-DYNA are trivial to install. Linux a FreeBSD installation booting from a ZFS filesystem binaries depending on MPI (Message Passing using a simple menu interface in the . Any Interface) for distributed parallel computing could modern PC with multiple disks can be up and run- also be a challenge. If you rely heavily on complex ning with FreeBSD on a RAIDZ array in a matter of closed-source Linux applications, you may be better minutes. off running an Enterprise Linux system. ZFS is not suited for every purpose, however. ZFS In order for FreeBSD to become a competitive is rather memory-hungry. On an HPC compute node, platform in HPC, it would also need to more easily where local disks are used only to house the operat- support a few other subsystems that currently only ing system and provide temporary storage, we may work well on Linux, such as nVidia’s CUDA GPU not want ZFS competing for memory resources with platform (although the open standard OpenCL is the computational processes. The same reasoning beginning to gain traction) and Infiniband intercon- would apply to any other machine devoted mainly nects that are commonly used for distributed parallel to CPU- or memory-bound tasks. Fortunately, computing. FreeBSD’s UFS2 filesystem also provides solid per- There are a few remaining features that are likely formance and reliability, with a very low memory to ensure the dominance of Enterprise Linux in HPC footprint. for a while. FreeBSD provides enterprise reliability and easy As it stands, though, FreeBSD is already an excel- management, but what about that other big advan- lent platform for the needs of most typical scientific tage of Enterprise Linux, support for commercial soft- computing. If you mainly run open source software ware products? Fortunately, FreeBSD’s Linux compati- and prefer to spend your time doing science rather bility module allows us to run most Linux binaries, than IT maintenance, FreeBSD will serve you well. • with no performance penalty, trivial memory overhead, and a very modest amount of disk space and effort. The lead sysadmin for research computing at the FreeBSD’s Linux compatibility is often erroneously University of Wisconsin-Milwaukee, Jason Bacon has referred to as emulation or a compatibility layer. been working with computers since 1983, and with In reality, it is neither. The basis of FreeBSD’s Linux FreeBSD and Linux since 1995. He is the author of compatibility is a kernel module that directly sup- The /Unix 's Guide, a comprehen- ports Linux system calls. The module activates an sive guide for beginning to intermediate C/C++ alternative function pointer table to directly invoke programming under UNIX, Linux, OS X, Linux-compatible kernel functions when a Linux and similar systems. He is proficient in Spanish and binary is being run. German, with a working knowledge of French and To complete the Linux-compatible environment, we Mandarin. When he isn’t inside working on various simply need to install the Linux versions of any shared systems, he can often be found outside, cycling, libraries and tools required by Linux programs, the sea kayaking, cross-country skiing, hiking, and same as we would on a real Linux system. The scuba diving.

8 FreeBSD Journal