Green Threads in Rust
KEVIN ROSENDAHL Additional Key Words and Phrases: Rust, Green Threads, Concurrency yield control from the thread, usually back into the runtime which then decides which thread to run next. ACM Reference format: Kevin Rosendahl. 2017. Green Threads in Rust. 1, 1, Article 1 (Decem- ber 2017), 7 pages. 2.2 Green Threads in the Wild https://doi.org/10.1145/nnnnnnn.nnnnnnn Green Threads can be found being used in a number of languages. The usage of green threads most in vogue right now is Go’s "gor- 1 SUMMARY outine." Go implements a green threads runtime that multiplexes goroutines over multiple CPU cores. Go’s runtime however is in For this project, I set out to implement a green threading library charge of scheduling the goroutines; the program does not have to written in Rust that offered an interface as similar as possible to explicitly yield control. std::thread. The most challenging aspect of this was interfacing Rust, in fact, has a history with green threads. A green threads with Rust to accomplish this. Compared to writing a green threading runtime used to be the default paradigm for Rust code. Among other library in C, Rust was very difficult to convince that the implemen- reasons (which will be addressed throughout the course of the rest tation is legal. However, once the low level implementation was of the paper) the Rust team decided that having a systems language complete, working in Rust is much better than working in straight use a green threads runtime did not quite align. Thus, they decided C. to abandon the green thread runtime and to switch to using native In the end, I was able to create a working green thread library threads. in Rust that exposes a std::thread like interface. Along the way I ( ) learned about Rust’s Foreign Function Interface FFI , the underlying 2.3 Event Loops representation of closures and other data types, and gained a much stronger grasp on Rust’s compile-time and runtime checking. Another concurrency pattern that is becoming increasingly popular I was also able to investigate abstractions that can be built on in mainstream programming languages is the event loop. Event top of green threads, and compare them to other concurrency ab- loops have events registered with them, and loop through all of stractions using paradigms other than green threads that attempt the events, handling any that have notified the loop that they are to solve similar problems. ready to be handled. Once all events that had triggered for that loop have been handled, the event loop begins again, handling newly 2 BACKGROUND triggered events. One popular example of an eventloop is libuv, upon which node.js 2.1 Threading is built. Rust also has a popular event loop library mio, upon which Among the different common concurrency paradigms, the most the tokio components are built. popular historically is threading. Threading allows the programmer to essentially write seperate programs that execute concurrently. 2.4 Async I/O, Callbacks, Futures, Promises, and The two main types of threads are "native"/"OS" threads, and Async/Await "green" threads. "Native" or "OS" threads are maintained by the Operating System the program is running on. These threads will Event loops are commonly employed to handle running many dif- usually be multiplexed over multiple CPU cores if they are available, ferent tasks that interact with asynchronous I/O. One method for and are typically preemptively scheduled by the Operating System. handling triggered events is providing a callback function. This Preemptive scheduling means that the program does not have to results in what is commonly referred to as "callback hell," where explicitly say that it yields control over to another thread. Instead, a program devolves into a series of unparsable callback function the Operating System itself will deschedule the thread from the core redirections. it is running on and schedule another thread on it automatically. To combat "callback hell," a number of abstractions have been "Green" threads on the other hand are not scheduled by the Oper- created. Futures provide a value which will eventually be resolved ating System, but instead are scheduled by a runtime running inside into the value it wraps. Promises and the async/await syntax are the program. This is sometimes referred to as "user level" threading attempts at wrapping futures to make the code look more like typical (as opposed to "kernel level" threading). Commonly, these threads procedural code, rather than evented callback-style code. are scheduled "cooperatively." One problem with these styles of abstractions is that they all still Cooperative scheduling is the opposite of preemptive scheduling. leak. Figure 1 shows the proposed syntax for async/await in Rust. In a cooperative scheduling paradigm, programs must explicitly At a glance, this looks great. The return type is a normal return type, not a future or any other weird type. However, upon further © 2017 Association for Computing Machinery. inspection, we see that the function has to be tagged with #[async]. This is the author’s version of the work. It is posted here for your personal use. Not for And indeed, all functions which end up calling fetch_rust_lang will redistribution. The definitive Version of Record was published in , https://doi.org/10. have to mark themselves with #[async]. Soon, your whole program 1145/nnnnnnn.nnnnnnn. is littered with async notations, and every time you need to change
, Vol. 1, No. 1, Article 1. Publication date: December 2017. 1:2 • Rosendahl, K.
#[async] #[repr(C)] fn fetch_rust_lang(client: hyper::Client) -> io::Result
Fig. 1. Proposed Rust async/await syntax Fig. 2. ThreadContext
grn_context_switch: one function that previously did not call an asynchronous function, # Move current state to old_context mov %rsp, (%rdi) you have to annotate it and every function that ever calls it. mov %r15, 0x08(%rdi) mov %r14, 0x10(%rdi) mov %r13, 0x18(%rdi) 2.5 Relation to CS242 mov %r12, 0x20(%rdi) mov %rbx, 0x28(%rdi) This project has strong ties to the lessons and themes of CS242. As mov %rbp, 0x30(%rdi)
I’ve just outlined, the concurrency paradigm that you choose to # Load new state from new_context write your program with has drastic impacts on the syntax, control mov (%rsi), %rsp mov 0x08(%rsi), %r15 flow, and general feel of the code. mov 0x10(%rsi), %r14 In addition to the more subjective (but nonetheless incredibly mov 0x18(%rsi), %r13 mov 0x20(%rsi), %r12 important) aspects described above, the paradigm you choose also mov 0x28(%rsi), %rbx can have an impact on the performance of your application. Later mov 0x30(%rsi), %rbp on we will discuss how the different design choices that can be ret made when writing a concurrency library can impact runtime per- formance. Fig. 3. grn_context_switch
3 APPROACH #[link(name = "context_switch", kind = "static")] extern { 3.1 Why a Rust Green Threads Library? fn grn_context_switch(old_context: *mut libc::c_void, new_context: *const libc::c_void); } Writing a green threads library in Rust was something I was very interested in doing. Last year in CS240, I implemented a very basic Fig. 4. Rust grn_context_switch FFI function green threads library in C (with some assembly) as part of a lab[12]. This got me interested in one day diving deeper into different con- currency techniques. I had been developing node.js applications This Rust wrapper, along with a build script using the cc library for a while and had gotten quite fed up with callback passing, and allows us to call out to the assembly function grn_context_switch. had started to use a fibers library [10]. One interesting thing to note here is the annotation #[repr(C)] on Rust has been an interest of mine for a few years now. I’ve found the ThreadContext struct. This tells the Rust compiler that it should its static analysis and compiler guarantees, along with its type sys- lay the struct out in memory as C would, which allows us to use tem, to be incredibly powerful. However, despite writing a few toy the struct as we do in grn_context_switch. projects in Rust I hadn’t yet had any real exposure to solving non- trivial problems in the language. 3.3 Thread Stacks Writing a green threads library in Rust seemed to be a perfect Implementing the context switch required a decent amount of re- marriage of these two interests. search into Rust’s FFI, the compiler, how to get it to link, and the memory layout of structs. That said, it was a pretty straightforward 3.2 Context Switching implementation. No real design decisions had to be made. A key aspect of a green threads library is the ability to context Once you have context switching working, the next step is to switch. That is, to be able to switch the executing control flow create a stack for the eventual green thread to use. This sounds between one green thread to another. straightforward: just allocate some memory and call it a day. How- In order to do this, you need to have the context you want to ever, it turns out there’s a large number of implications depending switch into, and a place to store the current context you’re switching on the strategy you use for stack allocation and management. out of. Figure 2 shows the Rust struct created to represent this. The first style of stack is just like a normal operating thread stack. You can then use pointers to these structs as the arguments to You simply allocate a chunk of memory on the heap, and use it our assembly function grn_context_switch (Fig. 3). as the stack. The main benefit of this technique is its simplicity. However, now we have to find a way to call this function from Additionally, if a thread ends up using a large portion of the stack, Rust. Enter Rust’s Foreign Function Interface (FFI)[4]. Figure 4 you get the benefit of only having to allocate memory once. shows the function we create in Rust in order to expose our as- However, many times programs wish to use green threads as sembly function. "lightweight tasks." Allocating a relatively large stack just to perform
, Vol. 1, No. 1, Article 1. Publication date: December 2017. Green Threads in Rust • 1:3
#[repr(C)] pub fn spawn
, Vol. 1, No. 1, Article 1. Publication date: December 2017. 1:4 • Rosendahl, K.
extern "C" fn call_fn(user_fn: *mut c_void, cb_fn: *mut c_void) { pub fn spawn
let callback : Box
let callback_fn_ptr = Box::new(callback_fn_ptr); Fig. 10. Rust grn_bootstrap_thread let callback_fn_stack_loc = stack_head.offset(-8isize); let callback_fn_ptr = &*callback_fn_ptr as *const _ as *mut _; std::ptr::copy(callback_fn_ptr, callback_fn_stack_loc, 8); object[5]. The long and short of this is that we cannot simply take let bootstrap_thread_ptr = Box::new(bootstrap_thread_stack_loc); let bootstrap_thread_ptr = &*bootstrap_thread_ptr as *const _ as *mut _; the address of this value, pass it around, and call it later. The trait std::ptr::copy(bootstrap_thread_ptr, rsp_ptr, 8); object contains more information than just the function, it also std::mem::forget(wrapped_user_fn_ptr); contains the environment the function closes over. std::mem::forget(callback_fn_ptr); } The next issue is that we don’t want to call some global _grn_exit } function, we want to call a method on the Runtime struct that’s in charge of keeping track of the set of active green threads. Eventually Fig. 11. Rust stack initialization we arrive at the following. We create a new global function called call_fn (Fig. 9). This function takes in two void pointers as arguments. #[repr(C)] However, it then transmutes these pointers (essentially casting them pub struct Thread { pub id: u64, // 0x00 back into Rust’s typed, owned scheme) into pointers to pointers to pub status: ThreadStatus, // 0x08 closures. The first argument should point at the user’s function, and pub context: ThreadContext, // 0x10-0x38 pub stack: ThreadStack, // 0x40-0x48 the second function should point to the closure that is in charge of } delegating back into the runtime after a thread exits. Fig. 10 shows the new grn_bootstrap_thread implementation, and Fig. 11 shows Fig. 12. Thread struct layout how some of the stack locations and the stack pointer were set. The Rust code seems verbose, but makes sense. However, what I have included in Fig. 11 includes a bug that plagued me for longer stack, no issue comes up while the threads are running. However, than I care to admit. A program would run completely fine all the when we go to deallocate the ThreadStack, its pointer is null, and way through, but would crash while cleaning up resources at the end. the program crashes. For some reason, Thread.stack.inner was a null pointer. Fig. 12 shows I think this bug really highlights what Rust is trying to accomplish. the Thread struct, along with its memory layout. As can be seen, C would let the programmer make that mistake as much as they the ThreadContext is above directly below the ThreadStack in the wanted to. But Rust made me really work to shoot myself in the foot. address space[3]. ThreadContext.rsp in particular is 7 bytes before And while debugging it took a lot more time and a lot more gdb the ThreadStack’s pointer to where its slice is on the heap. When we than I wish it had, I knew roughly where in the code the problem copy pointers onto the stack, we use std::ptr::copy(src, dst, 8). The 8 had to be. It had to be in one of the places that I had wrapped in is necessary since we are copying into a [u8], and since a pointer on unsafe. our supported platforms (x86_64) is 8 bytes long. However, when we use the same call to copy over the bootstrap_thread_ptr into 3.5 Cross-thread Reference Passing ThreadContext.rsp, we’re now copying into a u64. This then copied I wanted to have Runtime struct that was in charge of spawning, the values into ThreadContext.rsp and the seven 8-byte memory yielding, and cleaning up threads when they exited. Fig. 13 shows locations after it. The last of these locations was where the pointer the basic outline of these methods. to the stack was held, overwriting it with 0x0. Since none of our As seen in Fig. 11, the callback we call spawn is simply self.exit. library actually touches the stack, the thread simply uses it as a However, as we can see further down in yield_now_to_status, there
, Vol. 1, No. 1, Article 1. Publication date: December 2017. Green Threads in Rust • 1:5
pub fn yield_now(&self) { let cur_ptr; self.yield_now_to_status(ThreadStatus::Ready) { } let mut threads = self.threads.borrow_mut(); let mut current_thread = threads.get_mut(&self.current.borrow()).unwrap(); fn exit(&self) { current_thread.status = status; self.yield_now_to_status(ThreadStatus::Zombie); cur_ptr = &((*current_thread).context) as *const _ as *mut _; } }
fn yield_now_to_status(&self, status : ThreadStatus) { let new_ptr; // pseudo code below { let next_thread_idx = self.next_thread(); let mut threads = self.threads.borrow_mut(); let mut new_thread = threads.get_mut(&next_thread_idx).unwrap(); let current thread = self.threads.get(self.current); new_thread.status = ThreadStatus::Running; *current_thread.status = status; new_ptr = &((*new_thread).context) as *const _ as *mut _; let cur_ptr = ¤t_thread.context; } let new_ptr = &self.threads.get(next_thread_idx); { self.current = next_thread_idx; let mut current = self.current.borrow_mut(); *current = next_thread_idx; unsafe { } let cur_ptr = transmute::<*mut ThreadContext, *mut c_void>(cur_ptr); let new_ptr = transmute::<*mut ThreadContext, *mut c_void>(new_ptr); unsafe { grn_context_switch(cur_ptr, new_ptr); let cur_ptr = transmute::<*mut ThreadContext, *mut c_void>(cur_ptr); } let new_ptr = transmute::<*mut ThreadContext, *mut c_void>(new_ptr); } grn_context_switch(cur_ptr, new_context_ptr); }
Fig. 13. yield_now and exit Fig. 15. yield_now_to_status
pub struct Runtime { threads: RefCell
Rust’s borrow checker clearly won’t allow this. $ cargo run The answer to this problem is std::cell::RefCell. The RefCell in main 0 in thread 0 allows you to push Rust’s borrow checker requirements to runtime. in main 1 Here we know that there will de facto only ever be one thread in thread 1 in main 2 operating on these objects at a time, but Rust’s compiler is not able in thread 2 in main 3 to understand that. So we wrap all of the members of Runtime that in thread 3 we need to access in RefCells, and call borrow and borrow_mut to in main 4 in thread 4 access them. Fig. 14 shows what the Runtime struct ends being back out in main defined as. Fig. 15 shows the main bodyof yield_now_to_status. One interesting thing to note here is the explicit blocks required. Fig. 16. Sample program My original attempt at the rewritten yield_now_to_status did not in- clude the explicit blocks, and would panic complaining that self.threads was borrowed mutably multiple times at ones. This was because grn_context_switch would switch threads before the destructor of the temporarily borrowed value was run. This means that the temporar- green thread, but does not require the invoking function to explicitly ily borrowed value never released its ownership, and when the next call yield_now. Fig. 17 shows a sample program using sync, and Fig. thread got to yield_now_to_status and tried to call self.threads.borrow_mut, 18 shows sync’s implementation. it saw it as already borrowed mutably and panicked. I wished to also write a wrapper around an asynchronous I/O operation to be able to compare both the style and the speed of 4 RESULTS the green threads library vs something like tokio. However every I accomplished the task I originally set out to: build a green threads async I/O library I found seemed to be coupled too tightly to either library in Rust with a reasonably ergonomic API. Fig. 16 shows a mio or some other paradigm to easily integrate with my library. sample program and its output. Barring being able to do that, I will qualitatively compare the I was also able to play around with implementing some higher library I have created and the possible abstractions that could be level primitives. Notably, I implemented a sync function that runs a built on top with past, present, and future Rust options.
, Vol. 1, No. 1, Article 1. Publication date: December 2017. 1:6 • Rosendahl, K.
fn main() { thread::sync(|| { However, my library has one huge advantage over libgreen. You for i in 0..5 { can actually use it today. libgreen has not been committed to in println!("in thread {}", i); thread::yield_now(); over 3 years, half a year before Rust hit v1. In examining the source } you will not find Rust as you know it, but instead many different }); complicated now defunct syntax elements. println!("back out in main"); }
$ cargo run 4.2 tokio in thread 0 in thread 1 tokio is a bold attempt to implement a zero-cost abstraction over in thread 2 asynchronous I/O. The total number of characters of code you have in thread 3 in thread 4 to produce to get seriously impressive results is low, and it includes back out in main some nice combinators over the Futures that its functions return. The technical work behind tokio is world class, yet the number Fig. 17. Sample program one complaint they get is "it’s confusing!"[8]. In fact the initial v0.1 release is blocked on explaining to the world how they’re async
pub fn sync
, Vol. 1, No. 1, Article 1. Publication date: December 2017. Green Threads in Rust • 1:7
REFERENCES [1] 0000-remove-runtime. (????). https://github.com/aturon/rfcs/blob/ remove-runtime/active/0000-remove-runtime.md [2] Contiguous stacks. (????). https://docs.google.com/document/d/ 1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub [3] Rust Container Cheat Sheet. (????). https://docs.google.com/presentation/d/ 1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit#slide=id.p [4] Trait Objects. (????). https://doc.rust-lang.org/book/first-edition/ffi.html [5] Trait Objects. (????). https://doc.rust-lang.org/book/first-edition/trait-objects. html [6] Zero-cost futures in Rust. (????). https://aturon.github.io/blog/2016/08/11/futures/ [7] alexcrichton. libgreen. (????). https://github.com/alexcrichton/green-rs [8] alexcrichton. 2017. Restructure documentation from the ground up. (June 2017). https://github.com/tokio-rs/tokio/issues/13 [9] Brian Anderson. 2013. Abandoning segmented stacks in Rust. (Nov. 2013). https: //mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html [10] laverdet. node-fibers. (????). https://github.com/laverdet/node-fibers [11] Bob Nystrom. 2015. Restructure documentation from the ground up. (Feb. 2015). http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ [12] CS240 Staff. Lab 1: Cooperative Green Threads. (????). http://www.scs.stanford. edu/17sp-cs240/labs/lab1/ [13] steveklabnik. 2015. Async IO. (April 2015). https://github.com/rust-lang/rfcs/ issues/1081
, Vol. 1, No. 1, Article 1. Publication date: December 2017.