<<

Work / Technology & tools ILLUSTRATION BY THE PROJECT TWINS THE PROJECT BY ILLUSTRATION WHY SCIENTISTS ARE TURNING TO RUST Despite having a steep learning curve, the programming language offers speed and safety. By Jeffrey M. Perkel

n 2015, bioinformatician Johannes Köster Portions of Mozilla’s Firefox browser are huge data,” he says. “So that needs to be as fast was what he called “kind of a full-time written in Rust, and developers at Microsoft as possible.” But that power comes at a cost: Python guy”. He had already written one are reportedly using it to recode parts of the Rust learning curve is steep. popular tool — the workflow manager the Windows . The annual “It does take some up-front time,” says Carol Snakemake — in the programming lan- Stack Overflow Developer Survey, which this Nichols, a member of the Rust core team and Iguage. Now he was contemplating a project year polled nearly 65,000 programmers, has founder of the consultancy firm Integer 32 in that required a level of computational perfor- ranked Rust as the “most loved” programming Pittsburgh, Pennsylvania. “But it has enabled mance that Python simply couldn’t deliver. So language for 5 years running. The code-shar- me to do things that I wouldn’t otherwise be he began casting about for something new. ing site GitHub says Rust was the second-fast- able to do. I see that time as well spent.” Köster, now at the University of est-growing language on the platform in 2019, Duisburg-Essen in Germany, was looking for up 235% from the previous year. Caution: no guide rails a language that offered the “expressiveness” Workflows for analysing scientific data tend to of Python but the speed of languages such as C “The tooling and use languages such as Python, R and Matlab. and C++. In other words, “a high-performance These interpret lines of code one by one and language that is still, let’s say, ergonomic to infrastructure around then execute them, a style of programming use”, he explains. What he found was Rust. Rust is really phenomenal.” that is good for exploring data, but not at First created in 2006 by Graydon Hoare as speed. a side project while working at browser-de- C and C++ are fast, but they have “no guide veloper Mozilla, headquartered in Mountain Scientists, too, are turning to Rust. Köster, rails”, says Ashley Hauck, a Rust programmer View, California, Rust blends the performance for instance, used it to create an application, (or ‘Rustacean’, as community members are of languages such as C++ with friendlier syn- called Varlociraptor, that compares millions known) in Stockholm. For instance, there are tax, a focus on code safety and a well-engi- of sequence reads against billions of genetic no controls that stop a C or C++ programmer neered set of tools that simplify development. bases to identify genomic variants. “This is from inappropriately accessing memory that

Nature | Vol 588 | 3 December 2020 | 185 ©2020 Spri nger Nature Li mited. All rights reserved.

Work / Technology & tools achieving fourfold faster execution. Richard RUST RISING LET’S GET OXIDIZING Apodaca, founder of the cheminformatic-­ The Rust packages repository https://crates.io company Metamolecular in La Jolla, has grown sharply since 2016, mirroring the rapid uptake of the language. Here’s how to create a GenBank file California, says it took him about six months reader so you can explore some of the to become proficient in the language. PyPI Crates.io CRAN Missing (Python) (Rust) (R) data features of Rust. Focus on usability 250 • Install Rust at www.rust-lang.org/learn/ To compensate, Rust’s developers have get-started optimized the user experience, says Manish 200 • Clone the GitHub repository at https:// Goregaokar, who leads the Rust develop- github.com/jperkel/gb_read er-tooling team and is based in Berkeley, Cal- • Execute ‘cargo run’ from the command ifornia. For instance, the produces 150 line to download external dependencies particularly informative error messages, even and build the application. By default, highlighting offending code and suggesting the application parses the GenBank file how to fix it. “If your language is going to intro- 100 ‘nc_005816.gb’ in the GitHub repository, duce a novel concept, it had better be pleasant but you can specify an alternative input to work with,” Goregaokar explains.

file with ‘cargo run ’ The Rust community also provides exten- (thousands) Number of packages 50 • Execute the included tests using sive documentation and online help, including ‘cargo test’. a popular online reference called the Book and • Create and view documentation with a ‘Cookbook’ of recipes for solving common 0

‘cargo doc --open’. problems. Users praise the Rust toolchain — 2014 2016 2018 2020 HTTP://WWW.MODULECOUNTS.COM SOURCE: the applications that programmers use to turn has already been released back to the operat- code into applications (see ‘Let’s get oxidiz- on a computational-biology task that involved ing system, or to prevent the program from ing’). “The tooling and infrastructure around parsing 5.7 million sequence records. Rust releasing the same piece twice. In the best-case Rust is really phenomenal,” Patro says. Unlike edged out C to take the top spot. “When we scenario, this would cause the program to the many and ancillary utilities want to write a high-performance program crash. But it can also return meaningless data that programmers use to build C code, Rus- using multiple threads, and also if you need it or expose security vulnerabilities. According taceans can use a single tool, called Cargo, to to be very fast and also compact in memory, to researchers at Microsoft, 70% of the security compile Rust code, run tests, auto-generate then Rust is the ideal choice,” Li says. bugs that the company fixes each year relate documentation, upload a package to a repos- Luiz Irber, a bioinformatician at the Uni- to memory safety. itory and more. It also downloads and installs versity of California, Davis, used Rust to third-party packages automatically. A Cargo recode (or ‘oxidize’, in Rust parlance) a tool Memory rules plug-in called Clippy flags common errors and called Sourmash — which performs genomic Rust’s model uses rules to assign each piece of ‘non-idiomatic’ Rust code, a feature that Patro searches and taxonomic profiling — to ease memory to a single owner and enforce who can calls “absolutely phenomenal”. software maintenance, gain access to modern access it. Code that violates those rules never There are Rust plug-ins for popular devel- language features and make the code work in gets the chance to crash — it won’t compile. opment environments, such as Microsoft’s a web browser, he says. “They have a memory-management system Visual Studio Code and JetBrains’ IntelliJ, as Led by graduate student Hirak Sarkar, Patro’s that is based on this concept of lifetimes that well as a Rust ‘playground’ that provides a team used Rust to build a gene-expression lets the compiler track at compile-time when live, online Rust environment for code exper- analysis tool called Terminus after team mem- memory is allocated, when it’s freed, who imentation. And David Lattimore, a software ber Avi Srivastava returned from an internship owns it, who can access it,” explains Rob Patro, developer in Sydney, Australia, created a ‘ker- at 10x Genomics, a biotechnology company a computational biologist at the University nel’ for using Rust in Jupyter computational in Pleasanton, California, that uses Rust to of Maryland, College Park. “There’s an entire notebooks, as well as a Python-style interac- develop open-source tools. “The beauty of large class of correctness errors that go away tive environment called a REPL (read-evalu- Rust is, it makes the task of very simply by virtue of the way the language is ate-print loop). easy, because memory management is much, designed.” Aiding development is Rust’s ecosystem much better,” explains Srivastava, who is now The same guarantees help to ensure that of third-party packages, or ‘crates’, currently at the New York Genome Center. parallelized code — software written to run numbering nearly 50,000 (see ‘Rust rising’). But for many Rustaceans, the human ele- on multiple processors — can run safely, for These encapsulate algorithms in disciplines ment is equally compelling. Hauck, a mem- instance by eliminating the possibility that such as bioinformatics (Köster’s Rust-Bio), ber of the LGBT+ community, says that Rust multiple computational threads will access geosciences (the Geo-Rust project) and math- users have gone out of their way to make her the same data at the same time. ematics (nalgebra). Still, says Nichols, “that feel welcome. The community, she says, has The result is a language that is easier to main- could definitely tip the balance away from “always made an effort to be extremely inclu- tain and debug, but harder to learn. “No other Rust, if the libraries you need are just not in sive — like, very much aware of how diversity mainstream languages really have these con- Rust”. Programmers can sometimes bridge impacts things; very aware of how to write cepts, and they’re really core to understanding that gap using Rust’s ‘foreign function inter- a code of conduct and enforce that code of a lot of how you have to write code in Rust,” face’, however. conduct”. Nichols says. Stephan Hügel, who studies the “That’s probably a majority of the reason I’m visualization of geographical data at Trinity Oxidized code still writing Rust,” Hauck says. “It’s because the College Dublin, estimates that he spent two Coding logistics aside, what’s undeniable is community is so fantastic.” or three months porting a Python algorithm that Rust is fast. In May, bioinformatician Heng for converting geospatial coordinates from Li at the Dana-Farber Cancer Institute in Bos- Jeffrey M. Perkel is technology editor at one reference system into another into Rust, ton, Massachusetts, tested multiple languages Nature.

186 | Nature | Vol 588 | 3 December 2020 | Corrected 3 December 2020 | Corrected 11 December 2020 ©2020 Spri nger Nature Li mited. All rights reserved. ©2020 Spri nger Nature Li mited. All rights reserved.

Correction This Technology feature gave the wrong source for the graphic entitled ‘Rust rising’. The data came from http://www.module- counts.com, not from https://crates.io. Also, the story erred in stating that the Clippy plug-in is a third-party component. Finally, it erroneously stated that 70% of the bugs that Microsoft fixed each year relate to memory safety. In fact, it is 70% of the security bugs, not all bugs.

Nature | Corrected 2 December 2020 ©2020 Spri nger Nature Li mited. All rights reserved. ©2020 Spri nger Nature Li mited. All rights reserved.