RedLeaf: Isolation and Communication in a Safe

Vikram Narayanan1, Tianjiao Huang1, David Detweiler1, Dan Appel1, Zhaofeng Li1, Gerd Zellweger2, Anton Burtsev1

OSDI ’20

1University of California, Irvine

2VMware Research History of Isolation

Cedar Ka�eOS

Multics Pilot Scomp SPIN J-Kernel Mondrian VINO Singularity

1973 1980 1983 1995 1996 1999 2002 2005 Year

• Isolation of kernel subsystems • Final report of Multics (1976) • Scomp (1983) • Systems remained monolithic • Isolation was expensive

1 Isolation mechanisms

• Hardware Isolation • Segmentation (46 cycles)1 • Page table isolation (797 cycles)2 • VMFUNC (396 cycles)3 • Memory protection keys (20-26 cycles)4 • Language based isolation • Compare drivers written (DPDK-style) in a safe high-level language (, Rust, Go, C#, etc.)5 • Managed runtime and Garbage collection (20-50% overhead on a device-driver workload)

1L4 Microkernel: Jochen Liedtke 2https://sel4.systems/About/Performance/ 3Lightweight Kernel Isolation with Virtualization and VM Functions, VEE 2020 4Hodor: Intra-process isolation for high-throughput data plane libraries 5The Case for Writing Network Drivers in High-Level Programming Languages, ANCS 2019

2 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Rust

Traditional Safe languages vs Rust

Java, C# etc.

A

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Rust

Traditional Safe languages vs Rust

Java, C# etc.

A

Vector

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Rust

Traditional Safe languages vs Rust

Java, C# etc.

A B

Vector

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Rust

Traditional Safe languages vs Rust

Java, C# etc.

B

Vector

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Rust

Traditional Safe languages vs Rust

Java, C# etc.

Vector

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Rust

Traditional Safe languages vs Rust

Java, C# etc.

Garbage Vector collection?

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Traditional Safe languages vs Rust

Rust

A Java, C# etc.

Garbage Vector collection?

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Traditional Safe languages vs Rust

Rust

A Java, C# etc.

Vector

Garbage Vector collection?

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Traditional Safe languages vs Rust

Rust

A B Java, C# etc.

Vector

Garbage Vector collection?

3 • Linear types • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Traditional Safe languages vs Rust

Rust

B Java, C# etc.

Vector

Garbage Vector collection?

3 • Enforces type and memory safety • Statically checked at compile time • Safety without runtime garbage collection overhead

Traditional Safe languages vs Rust

Rust

B Java, C# etc.

Vector

• Linear types

Garbage Vector collection?

3 • Statically checked at compile time • Safety without runtime garbage collection overhead

Traditional Safe languages vs Rust

Rust

B Java, C# etc.

Vector

• Linear types • Enforces type and memory

Garbage Vector safety collection?

3 • Safety without runtime garbage collection overhead

Traditional Safe languages vs Rust

Rust

B Java, C# etc.

Vector

• Linear types • Enforces type and memory

Garbage Vector safety collection? • Statically checked at compile time

3 Traditional Safe languages vs Rust

Rust

B Java, C# etc.

Vector

• Linear types • Enforces type and memory

Garbage Vector safety collection? • Statically checked at compile time • Safety without runtime garbage collection overhead

3 Language-based isolation - Rust

• Mostly use Rust as a drop-in replacement for C • Numerous possbilities • Fault Isolation • Transparent device-driver recovery • Safe Kernel extensions • Fine-grained capability-based access control etc.

4 Fault isolation in Language-based systems

Domain Foo Domain Bar

X

X fn bar() { do_foo(&obj1, &obj2);

fn foo(obj1: &Object1, obj2: &Object2) { } do_work(obj1); do_work(obj2); }

• e.g., SPIN (using Modula-3 pointers)

5 Fault isolation in Language-based systems

Domain Foo Domain Bar

fn bar() { do_foo(&obj1, &obj2);

fn foo(obj1: &Object1, obj2: &Object2) { } do_work(obj1); do_work(obj2); }

• e.g., SPIN (using Modula-3 pointers)

5 Fault isolation in Language-based systems

Domain Foo Domain Bar

fn bar() { do_foo(&obj1, &obj2);

fn foo(obj1: &Object1, obj2: &Object2) { } do_work(obj1); do_work(obj2); }

• e.g., SPIN (using Modula-3 pointers)

5 Fault isolation in Language-based systems

Domain Foo Domain Bar

fn bar() { do_foo(&obj1, &obj2);

fn foo(obj1: &Object1, obj2: &Object2) { } do_work(obj1);X do_work(obj2); }

• e.g., SPIN (using Modula-3 pointers)

5 Fault isolation in Language-based systems

Domain Foo Domain Bar

fn bar() { do_foo(&obj1, &obj2);

fn foo(obj1: &Object1, obj2: &Object2) { } do_work(obj1);X do_work(obj2); }

• e.g., SPIN (using Modula-3 pointers)

5 Language-based isolation: Deep copy

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1:Object1, obj2:Object2) { call_other(obj1, obj2); } }

• e.g., J-Kernel, KaffeOS

6 Language-based isolation: Deep copy

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1:Object1, obj2:Object2) { call_other(obj1, obj2); } }

• e.g., J-Kernel, KaffeOS

6 Language-based isolation: Deep copy

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1:Object1, obj2:Object2) { call_other(obj1, obj2); } }

• e.g., J-Kernel, KaffeOS

6 Language-based isolation: Deep copy

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1:Object1, obj2:Object2) { call_other(obj1,X obj2); } }

• e.g., J-Kernel, KaffeOS

6 Language-based isolation: Capabilities

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

• e.g., J-Kernel

7 Language-based isolation: Capabilities

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

• e.g., J-Kernel

7 Language-based isolation: Capabilities

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

• e.g., J-Kernel

7 Language-based isolation: Capabilities

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

• e.g., J-Kernel

7 Language-based isolation: Capabilities

Domain Foo Domain Bar

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1,X &obj2); }

• e.g., J-Kernel

7 Language-based isolation: Capabilities

Domain Foo Domain Bar

X

X fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1,X &obj2); }

• e.g., J-Kernel

7 • Single ownership model • Static analysis and verification tools (Sing#) • Reusing the moved object return an error • zero-copy

Language-based isolation: Singularity

Shared Heap

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

Static analysis Verification tools

Domain Foo Domain Bar

• Statically enforced ownership discipline

8 • Static analysis and verification tools (Sing#) • Reusing the moved object return an error • zero-copy

Language-based isolation: Singularity

Shared Heap

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

Static analysis Verification tools

Domain Foo Domain Bar

• Statically enforced ownership discipline • Single ownership model

8 • Static analysis and verification tools (Sing#) • Reusing the moved object return an error • zero-copy

Language-based isolation: Singularity

Shared Heap

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

Static analysis Verification tools

Domain Foo Domain Bar

• Statically enforced ownership discipline • Single ownership model

8 • Static analysis and verification tools (Sing#) • Reusing the moved object return an error • zero-copy

Language-based isolation: Singularity

Shared Heap

X

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

Static analysis Verification tools

Domain Foo Domain Bar

• Statically enforced ownership discipline • Single ownership model

8 • Reusing the moved object return an error • zero-copy

Language-based isolation: Singularity

Shared Heap

X

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

Static analysis Verification tools

Domain Foo Domain Bar

• Statically enforced ownership discipline • Single ownership model • Static analysis and verification tools (Sing#)

8 Language-based isolation: Singularity

Shared Heap

X

fn bar(....) { do_work(&obj1, &obj2);

fn foo(obj1: Object1, obj2: Object2) { } call_other(obj1, &obj2); }

Static analysis Verification tools

Domain Foo Domain Bar

• Statically enforced ownership discipline • Single ownership model • Static analysis and verification tools (Sing#) • Reusing the moved object return an error • zero-copy 8 RedLeaf Shared Heap

RRef

rv6 rv6 RedLeaf User User User

Proxy

rv6 Core

Proxy Proxy Compiler-enforced FS Net protection domains rv6

Proxy Proxy

NVMe Ixgbe Driver Driver Trusted Trusted crate crate

Architecture

Microkernel

9 Shared Heap

RRef

rv6 rv6 RedLeaf User User User

Proxy

rv6 Core

Proxy Proxy Compiler-enforced FS Net protection domains rv6

Proxy Proxy

Architecture

NVMe Ixgbe Driver Driver Trusted Trusted crate crate

Microkernel

9 RRef

rv6 rv6 User User

Proxy

rv6 Core

Proxy Proxy Compiler-enforced FS Net protection domains rv6

Architecture

Shared Heap

RedLeaf User

Proxy Proxy

NVMe Ixgbe Driver Driver Trusted Trusted crate crate

Microkernel

9 Architecture

Shared Heap

RRef

rv6 rv6 RedLeaf User User User

Proxy

rv6 Core

Proxy Proxy Compiler-enforced FS Net protection domains rv6

Proxy Proxy

NVMe Ixgbe Driver Driver Trusted Trusted crate crate

Microkernel

9 • Unwind all threads running inside • Subsequent invocations return error • All resources are deallocated • Other threads continue execution

Fault Isolation

• After a domain crash:

10 • Subsequent invocations return error • All resources are deallocated • Other threads continue execution

Fault Isolation

• After a domain crash: • Unwind all threads running inside

10 • All resources are deallocated • Other threads continue execution

Fault Isolation

• After a domain crash: • Unwind all threads running inside • Subsequent invocations return error

10 • Other threads continue execution

Fault Isolation

• After a domain crash: • Unwind all threads running inside • Subsequent invocations return error • All resources are deallocated

10 Fault Isolation

• After a domain crash: • Unwind all threads running inside • Subsequent invocations return error • All resources are deallocated • Other threads continue execution

10 • Special shared heap for passing objects between domains

Heap Isolation

fn foo(obj1: Object1, fn bar(....) { obj2: Object2) { do_work(&obj1, &obj2); call_other(obj1, &obj2); } }

X

Private Heap Private Heap

Domain Foo Domain Bar

• Domains never hold pointers into other domains

11 • Special shared heap for passing objects between domains

Heap Isolation

fn foo(obj1: Object1, fn bar(....) { obj2: Object2) { do_work(&obj1, &obj2); call_other(obj1, &obj2); } }

Private Heap Private Heap

Domain Foo Domain Bar

• Domains never hold pointers into other domains

11 Heap Isolation

RRef Shared Heap

fn foo(obj1: Object1, fn bar(....) { obj2: RRef) { do_work(&obj1, &obj2); call_other(obj1, &obj2); } }

X

Private Heap Private Heap

Domain Foo Domain Bar

• Domains never hold pointers into other domains • Special shared heap for passing objects between domains

11 • Cannot point to normal pointers in shared heap or private heap

Exchangeable types

RRef RRef

Shared Heap

fn foo(obj1: Object1, obj2: RRef) { call_other(obj1, &obj2); }

Private Heap

Domain Foo

• Objects in shared heap can only be exchangeable types

12 Exchangeable types

X

RRef RRef Object

Shared Heap X

fn foo(obj1: Object1, obj2: RRef) { call_other(obj1, &obj2); }

Private Heap

Domain Foo

• Objects in shared heap can only be exchangeable types • Cannot point to normal pointers in shared heap or private heap

12 • Metadata keeps track of owner domain and ref count • Mediated through trusted proxies

Ownership tracking

struct RRef { domain_id: u32, borrow_cnt: u32 data_ptr: *mut T } Shared Heap

fn foo(obj1: RRef, obj2: Object2) { fn bar(....) { call_other(obj1, &obj2); do_work(&obj1, &obj2); }

}

Private Heap

Domain Foo Trusted Proxy Domain Bar

• RRef’s can be passed between domains

13 • Mediated through trusted proxies

Ownership tracking

struct RRef { domain_id: u32, borrow_cnt: u32 data_ptr: *mut T } Shared Heap

fn foo(obj1: RRef, obj2: Object2) { fn proxy_bar() { fn bar(....) { call_other(obj1, &obj2); domain_id = x; do_work(&obj1, &obj2); } borrow_cnt = y; } }

Private Heap

Domain Foo Trusted Proxy Domain Bar

• RRef’s can be passed between domains • Metadata keeps track of owner domain and ref count

13 Ownership tracking

struct RRef { domain_id: u32, borrow_cnt: u32 data_ptr: *mut T } Shared Heap X

fn foo(obj1: RRef, obj2: Object2) { fn proxy_bar() { fn bar(....) { call_other(obj1, &obj2); domain_id = x; do_work(&obj1, &obj2); } borrow_cnt = y; } }

Private Heap

Domain Foo Trusted Proxy Domain Bar

• RRef’s can be passed between domains • Metadata keeps track of owner domain and ref count • Mediated through trusted proxies

13 • On panic • Deallocate all objects owned by the crashing domain • Defer borrowed RRef’s until ref count is zero

Heap reclamation

Dom A Shared Heap ... b.foo(x); ... dom_id: A

X Proxy y: RRef fn foo(x: RRef) -> RpcResult<()> { ... x.move_to(B); ret = b.foo(x); dom_id: _ ... }; Y

Dom B fn foo(x: RRef) -> RpcResult<()> { ... }; Heap Registry

Microkernel

• Maintains a global registry of allocated objects

14 Heap reclamation

Dom A Shared Heap ... b.foo(x); ... dom_id: A

X Proxy y: RRef fn foo(x: RRef) -> RpcResult<()> { ... x.move_to(B); ret = b.foo(x); dom_id: _ ... }; Y

Dom B fn foo(x: RRef) -> RpcResult<()> { ... }; Heap Registry

Microkernel

• Maintains a global registry of allocated objects • On panic • Deallocate all objects owned by the crashing domain • Defer borrowed RRef’s until ref count is zero

14 • Creates continuation • Moves ownership of all RRef

Cross-domain call proxying

struct RRef { domain_id: u32, borrow_cnt: u32 data_ptr: *mut T } Shared Heap X

fn foo(obj1: RRef, obj2: Object2) { fn proxy_bar() { fn bar(....) { call_other(obj1, &obj2); is_alive(); do_work(&obj1, &obj2); } create_cont(); domain_id = x; borrow_cnt = y; } }

Private Heap

Domain Foo Trusted Proxy Domain Bar

• Checks if domain is alive

15 • Moves ownership of all RRef

Cross-domain call proxying

struct RRef { domain_id: u32, borrow_cnt: u32 data_ptr: *mut T } Shared Heap X

fn foo(obj1: RRef, obj2: Object2) { fn proxy_bar() { fn bar(....) { call_other(obj1, &obj2); is_alive(); do_work(&obj1, &obj2); } create_cont(); domain_id = x; borrow_cnt = y; } }

Private Heap

Domain Foo Trusted Proxy Domain Bar

• Checks if domain is alive • Creates continuation

15 Cross-domain call proxying

struct RRef { domain_id: u32, borrow_cnt: u32 data_ptr: *mut T } Shared Heap X

fn foo(obj1: RRef, obj2: Object2) { fn proxy_bar() { fn bar(....) { call_other(obj1, &obj2); is_alive(); do_work(&obj1, &obj2); } create_cont(); domain_id = x; borrow_cnt = y; } }

Private Heap

Domain Foo Trusted Proxy Domain Bar

• Checks if domain is alive • Creates continuation • Moves ownership of all RRef 15 Interface validation

• Validate domain interfaces • Generate proxies to enforce ownership discipline • e.g., Block Device domain Interface pub trait BDev { fn read(&self, block: u32, data: RRef<[u8; BSIZE]>) -> RpcResult>; fn write(&self, block: u32, data: &RRef<[u8; BSIZE]>) -> RpcResult<()>; }

16 RedLeaf

Shared Heap

RRef

rv6 rv6 RedLeaf User User User

Proxy

rv6 Core • Heap isolation

Proxy Proxy • Exchangeable types Compiler-enforced FS Net protection domains • Ownership tracking rv6

Proxy Proxy • Interface validation

NVMe Ixgbe • Cross-domain call proxying Driver Driver Trusted Trusted crate crate

Microkernel

17 Device driver Recovery Device driver Recovery

fn bar(....) { do_work(&obj1, &obj2); fn foo(obj1: Object1, obj2: RRef) { call_other(obj1, &obj2); } }

Private Heap

Domain Foo Shadow domain Trusted Proxy Domain Bar

• Support transparent device driver recovery • Wraps the interaface to expose an identical interface • Interposes on all communication

18 Device driver Recovery

fn proxy_bar() { fn bar(....) { check_if_alive(); do_work(&obj1, &obj2); fn foo(obj1: Object1, obj2: RRef) { create_cont(); call_other(obj1, &obj2); return Err(..); } } }

Private Heap

Domain Foo Shadow domain Trusted Proxy Domain Bar

• Support transparent device driver recovery • Wraps the interaface to expose an identical interface • Interposes on all communication

18 Device driver Recovery

fn proxy_bar() { fn bar(....) { check_if_alive(); do_work(&obj1, &obj2); fn foo(obj1: Object1, obj2: RRef) { create_cont(); call_other(obj1, &obj2); fn bar() { return Err(..); } .... } }

replay_init(); }

Private Heap

Domain Foo Shadow domain Trusted Proxy Domain Bar

• Support transparent device driver recovery • Wraps the interaface to expose an identical interface • Interposes on all communication

18 Device driver Recovery

fn proxy_bar() { fn bar(....) { check_if_alive(); do_work(&obj1, &obj2); fn foo(obj1: Object1, obj2: RRef) { create_cont(); call_other(obj1, &obj2); fn bar() { return Err(..); } ... } }

replay_init(); }

Private Heap

Domain Foo Shadow domain Trusted Proxy Domain Bar

• Support transparent device driver recovery • Wraps the interaface to expose an identical interface • Interposes on all communication

18 Evaluation System setup

• 2 x Intel E5-2660 v3 10-core CPUs at 2.60 GHz (Haswell EP) • Disabled: Hyper-threading, Turbo boost, CPU Idle states • Linux and DPDK benchmarks run on version 4.8.4 • RedLeaf benchmarks run on

19 Communication costs

Operation Cycles seL4 834 VMFUNC 169 VMFUNC-based call/reply invocation 396 RedLeaf cross-domain invocation 124 RedLeaf cross-domain invocation (passing an RRef) 141 RedLeaf cross-domain invocation via shadow 279 RedLeaf cross-domain via shadow (passing an RRef) 297

20 • C, Idiomatic Rust, C-style Rust, • C-style Rust: No higher order functions usize, usize • Idiomatic Rust - Option<(usize, usize)> • Vary the size (212 to 226 at 75% full) • Idiomatic Rust - 25% overhead, C-style - Closer to C • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

Language overheads: C vs Rust

• Hashtable - (FNV hash, open addressing, <8B, 8B>)

21 • C-style Rust: No higher order functions usize, usize • Idiomatic Rust - Option<(usize, usize)> • Vary the size (212 to 226 at 75% full) • Idiomatic Rust - 25% overhead, C-style - Closer to C • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

Language overheads: C vs Rust

• Hashtable - (FNV hash, open addressing, <8B, 8B>) • C, Idiomatic Rust, C-style Rust,

21 • Idiomatic Rust - Option<(usize, usize)> • Vary the size (212 to 226 at 75% full) • Idiomatic Rust - 25% overhead, C-style - Closer to C • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

Language overheads: C vs Rust

• Hashtable - (FNV hash, open addressing, <8B, 8B>) • C, Idiomatic Rust, C-style Rust, • C-style Rust: No higher order functions usize, usize

21 • Vary the size (212 to 226 at 75% full) • Idiomatic Rust - 25% overhead, C-style - Closer to C • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

Language overheads: C vs Rust

• Hashtable - (FNV hash, open addressing, <8B, 8B>) • C, Idiomatic Rust, C-style Rust, • C-style Rust: No higher order functions usize, usize • Idiomatic Rust - Option<(usize, usize)>

21 • Idiomatic Rust - 25% overhead, C-style - Closer to C • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

Language overheads: C vs Rust

• Hashtable - (FNV hash, open addressing, <8B, 8B>) • C, Idiomatic Rust, C-style Rust, • C-style Rust: No higher order functions usize, usize • Idiomatic Rust - Option<(usize, usize)> • Vary the size (212 to 226 at 75% full)

21 • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

Language overheads: C vs Rust

idiomatic-rust-set() idiomatic-rust-get()

1.5 1.25 1 0.75 0.5 0.25 0

TSC ratio normalized to C 12 14 16 18 20 22 24 26 Number of elements in the hash table (power of two)

• Hashtable - (FNV hash, open addressing, <8B, 8B>) • C, Idiomatic Rust, C-style Rust, • C-style Rust: No higher order functions usize, usize • Idiomatic Rust - Option<(usize, usize)> • Vary the size (212 to 226 at 75% full) • Idiomatic Rust - 25% overhead, C-style - Closer to C

21 • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

Language overheads: C vs Rust

idiomatic-rust-set() idiomatic-rust-get() c-style-rust-set() c-style-rust-get() 1.5 1.25 1 0.75 0.5 0.25 0

TSC ratio normalized to C 12 14 16 18 20 22 24 26 Number of elements in the hash table (power of two)

• Hashtable - (FNV hash, open addressing, <8B, 8B>) • C, Idiomatic Rust, C-style Rust, • C-style Rust: No higher order functions usize, usize • Idiomatic Rust - Option<(usize, usize)> • Vary the size (212 to 226 at 75% full) • Idiomatic Rust - 25% overhead, C-style - Closer to C

21 Language overheads: C vs Rust

idiomatic-rust-set() idiomatic-rust-get() c-style-rust-set() c-style-rust-get() 1.5 1.25 1 0.75 0.5 0.25 0

TSC ratio normalized to C 12 14 16 18 20 22 24 26 Number of elements in the hash table (power of two)

• Hashtable - (FNV hash, open addressing, <8B, 8B>) • C, Idiomatic Rust, C-style Rust, • C-style Rust: No higher order functions usize, usize • Idiomatic Rust - Option<(usize, usize)> • Vary the size (212 to 226 at 75% full) • Idiomatic Rust - 25% overhead, C-style - Closer to C • Cycles for a set on 220 (C - 51, idrust - 81, cst-rust - 48)

21 • redleaf-domain • rv6-domain • redleaf-shadow • rv6-shadow

Case Study: Device Drivers

Configurations

• redleaf-driver

Ixgbe Driver

Microkernel

22 • rv6-domain • redleaf-shadow • rv6-shadow

Case Study: Device Drivers

RedLeaf User

Configurations

• redleaf-driver • redleaf-domain

Proxy

Ixgbe Driver

Microkernel

22 • redleaf-shadow • rv6-shadow

Case Study: Device Drivers

rv6 User

Proxy

rv6 Core

Configurations Proxy

Net • redleaf-driver rv6 • redleaf-domain • rv6-domain

Proxy

Ixgbe Driver

Microkernel

22 • rv6-shadow

Case Study: Device Drivers

RedLeaf User

Configurations

• redleaf-driver

• redleaf-domain Proxy • rv6-domain Shadow Driver

• redleaf-shadow Proxy Ixgbe Driver

Microkernel

22 Case Study: Device Drivers

rv6 User

Proxy

rv6 Core

Configurations Proxy

Net • redleaf-driver rv6

• redleaf-domain Proxy • rv6-domain Shadow Driver

• redleaf-shadow Proxy Ixgbe • rv6-shadow Driver

Microkernel

22 • DPDK (388 cycles) and Redleaf driver (400 cycles) • Redleaf-domain and Redleaf-shadow • Rv6-domain and shadow

Ixgbe performance benchmark

Linux redleaf-domain rv6-shadow DPDK redleaf-shadow line rate redleaf-driver rv6-domain 20

16 0.89 M

12

8 0.89 M

Pkts/s (Million) 4

0 Tx-64-1 Rx-64-1

• Linux application (2921 cycles)

23 • Redleaf-domain and Redleaf-shadow • Rv6-domain and shadow

Ixgbe performance benchmark

Linux redleaf-domain rv6-shadow DPDK redleaf-shadow line rate redleaf-driver rv6-domain 20

16 0.89 M

12 6.7 M 6.7 M 6.5 M 6.5 M 8 0.89 M

Pkts/s (Million) 4

0 Tx-64-1 Rx-64-1

• Linux application (2921 cycles) • DPDK (388 cycles) and Redleaf driver (400 cycles)

23 • Redleaf-domain and Redleaf-shadow • Rv6-domain and shadow

Ixgbe performance benchmark

Linux redleaf-domain rv6-shadow DPDK redleaf-shadow line rate redleaf-driver rv6-domain 20

16 0.89 M

12 6.7 M 6.7 M 6.5 M 6.5 M 8 0.89 M

Pkts/s (Million) 4

0 Tx-64-1 Tx-64-32 Rx-64-1 Rx-64-32

• Linux application (2921 cycles) • DPDK (388 cycles) and Redleaf driver (400 cycles)

23 • Rv6-domain and shadow

Ixgbe performance benchmark

Linux redleaf-domain rv6-shadow DPDK redleaf-shadow line rate redleaf-driver rv6-domain 20

16 0.89 M

12 6.7 M 6.7 M 6.5 M 4 M 6.5 M 8 0.89 M 2.9 M

Pkts/s (Million) 4

0 Tx-64-1 Tx-64-32 Rx-64-1 Rx-64-32

• Linux application (2921 cycles) • DPDK (388 cycles) and Redleaf driver (400 cycles) • Redleaf-domain and Redleaf-shadow

23 Ixgbe performance benchmark

Linux redleaf-domain rv6-shadow DPDK redleaf-shadow line rate redleaf-driver rv6-domain 20

16 0.89 M 2.8 M

12 6.7 M 6.7 M 6.5 M 4 M 6.5 M 8 0.89 M 2.9 M 2.4 M

Pkts/s (Million) 4

0 Tx-64-1 Tx-64-32 Rx-64-1 Rx-64-32

• Linux application (2921 cycles) • DPDK (388 cycles) and Redleaf driver (400 cycles) • Redleaf-domain and Redleaf-shadow • Rv6-domain and shadow

23 • Network-attached key-value store • a minimal webserver

Application Benchmarks

• Maglev load balancer

24 • a minimal webserver

Application Benchmarks

• Maglev load balancer • Network-attached key-value store

24 Application Benchmarks

• Maglev load balancer • Network-attached key-value store • a minimal webserver

24 • Lookup or insert into the flow tracking table • Compare C vs Rust Maglev versions

• Linux Application - Limited due to synchronous socket interface • DPDK Application (batch of 32) - 9.7 Mpps per-core • Redleaf-driver - 7.2 Mpps • Rv6-domain - 5.3 Mpps, Rv6-shadow - 5.1 Mpps

Application benchmarks: Maglev

• Load balancer developed by Google

25 • Compare C vs Rust Maglev versions

• Linux Application - Limited due to synchronous socket interface • DPDK Application (batch of 32) - 9.7 Mpps per-core • Redleaf-driver - 7.2 Mpps • Rv6-domain - 5.3 Mpps, Rv6-shadow - 5.1 Mpps

Application benchmarks: Maglev

• Load balancer developed by Google • Lookup or insert into the flow tracking table

25 • Linux Application - Limited due to synchronous socket interface • DPDK Application (batch of 32) - 9.7 Mpps per-core • Redleaf-driver - 7.2 Mpps • Rv6-domain - 5.3 Mpps, Rv6-shadow - 5.1 Mpps

Application benchmarks: Maglev

• Load balancer developed by Google • Lookup or insert into the flow tracking table • Compare C vs Rust Maglev versions

25 • DPDK Application (batch of 32) - 9.7 Mpps per-core • Redleaf-driver - 7.2 Mpps • Rv6-domain - 5.3 Mpps, Rv6-shadow - 5.1 Mpps

Application benchmarks: Maglev

• Load balancer developed by Google • Lookup or insert into the flow tracking table • Compare C vs Rust Maglev versions Linux redleaf-shadow DPDK rv6-domain redleaf-driver rv6-shadow redleaf-domain line rate

16

12

8 1 M 4 Pkts/s (Millions) 0 maglev-32

• Linux Application - Limited due to synchronous socket interface

25 • Redleaf-driver - 7.2 Mpps • Rv6-domain - 5.3 Mpps, Rv6-shadow - 5.1 Mpps

Application benchmarks: Maglev

• Load balancer developed by Google • Lookup or insert into the flow tracking table • Compare C vs Rust Maglev versions Linux redleaf-shadow DPDK rv6-domain redleaf-driver rv6-shadow redleaf-domain line rate

16

12 9.7 M

8 1 M 4 Pkts/s (Millions) 0 maglev-32

• Linux Application - Limited due to synchronous socket interface • DPDK Application (batch of 32) - 9.7 Mpps per-core

25 • Rv6-domain - 5.3 Mpps, Rv6-shadow - 5.1 Mpps

Application benchmarks: Maglev

• Load balancer developed by Google • Lookup or insert into the flow tracking table • Compare C vs Rust Maglev versions Linux redleaf-shadow DPDK rv6-domain redleaf-driver rv6-shadow redleaf-domain line rate

16

9.7 M 12 7.2 M 8 1 M 4 Pkts/s (Millions) 0 maglev-32

• Linux Application - Limited due to synchronous socket interface • DPDK Application (batch of 32) - 9.7 Mpps per-core • Redleaf-driver - 7.2 Mpps

25 Application benchmarks: Maglev

• Load balancer developed by Google • Lookup or insert into the flow tracking table • Compare C vs Rust Maglev versions Linux redleaf-shadow DPDK rv6-domain redleaf-driver rv6-shadow redleaf-domain line rate

16

9.7 M 12 7.2 M 8 5.3 M 1 M 4 5.1 M Pkts/s (Millions) 0 maglev-32

• Linux Application - Limited due to synchronous socket interface • DPDK Application (batch of 32) - 9.7 Mpps per-core • Redleaf-driver - 7.2 Mpps • Rv6-domain - 5.3 Mpps, Rv6-shadow - 5.1 Mpps

25 • FNV Hash with open addressing (linear probing) • Rust (C-Style) vs a DPDK application • Various Key value sizes (<8B, 8B>, <16B, 64B>, <64B, 64B>) • Achieves 61-86% performance (extend_from_slice() x 3) • With Unsafe Rust 85-94% performance of the C DPDK

Application: Key Value Store

• Network attached key-value store

26 • Rust (C-Style) vs a DPDK application • Various Key value sizes (<8B, 8B>, <16B, 64B>, <64B, 64B>) • Achieves 61-86% performance (extend_from_slice() x 3) • With Unsafe Rust 85-94% performance of the C DPDK

Application: Key Value Store

• Network attached key-value store • FNV Hash with open addressing (linear probing)

26 • Various Key value sizes (<8B, 8B>, <16B, 64B>, <64B, 64B>) • Achieves 61-86% performance (extend_from_slice() x 3) • With Unsafe Rust 85-94% performance of the C DPDK

Application: Key Value Store

c-dpdk 8 redleaf-driver

4 Pkts/s (Million)

0

8-8-1M 8-8-16M 16-64-1M 64-64-1M 16-64-16M 64-64-16M

• Network attached key-value store • FNV Hash with open addressing (linear probing) • Rust (C-Style) vs a DPDK application

26 • Achieves 61-86% performance (extend_from_slice() x 3) • With Unsafe Rust 85-94% performance of the C DPDK

Application: Key Value Store

c-dpdk 8 redleaf-driver

4 Pkts/s (Million)

0

8-8-1M 8-8-16M 16-64-1M 64-64-1M 16-64-16M 64-64-16M

• Network attached key-value store • FNV Hash with open addressing (linear probing) • Rust (C-Style) vs a DPDK application • Various Key value sizes (<8B, 8B>, <16B, 64B>, <64B, 64B>)

26 • With Unsafe Rust 85-94% performance of the C DPDK

Application: Key Value Store

c-dpdk 8 redleaf-driver

4 Pkts/s (Million)

0

8-8-1M 8-8-16M 16-64-1M 64-64-1M 16-64-16M 64-64-16M

• Network attached key-value store • FNV Hash with open addressing (linear probing) • Rust (C-Style) vs a DPDK application • Various Key value sizes (<8B, 8B>, <16B, 64B>, <64B, 64B>) • Achieves 61-86% performance (extend_from_slice() x 3)

26 Application: Key Value Store

c-dpdk 8 redleaf-driver redleaf-unsafe

4 Pkts/s (Million)

0

8-8-1M 8-8-16M 16-64-1M 64-64-1M 16-64-16M 64-64-16M

• Network attached key-value store • FNV Hash with open addressing (linear probing) • Rust (C-Style) vs a DPDK application • Various Key value sizes (<8B, 8B>, <16B, 64B>, <64B, 64B>) • Achieves 61-86% performance (extend_from_slice() x 3) • With Unsafe Rust 85-94% performance of the C DPDK

26 • Read: 2062 MB/s (restart), 2164 MB/s (without restart) • Writes: 356 MB/s (restart), 423 MB/s (without restart)

• Trigger crash every second • Small drop in performance

Device driver recovery

• Rv6 program <-> in-memory block device

27 • Read: 2062 MB/s (restart), 2164 MB/s (without restart) • Writes: 356 MB/s (restart), 423 MB/s (without restart)

• Small drop in performance

Device driver recovery

• Rv6 program <-> in-memory block device • Trigger crash every second

27 • Writes: 356 MB/s (restart), 423 MB/s (without restart)

Device driver recovery

Read 2000

1500

1000

500

0 Read Throughput (MB/sec) 0 2 4 6 8 10 Time (sec)

• Rv6 program <-> in-memory block device • Trigger crash every second • Small drop in performance • Read: 2062 MB/s (restart), 2164 MB/s (without restart)

27 Device driver recovery

Read Write 2000 800

1500 600

1000 400

500 200

0 0 Read Throughput (MB/sec) Write Throughput (MB/sec) 0 2 4 6 8 10 Time (sec)

• Rv6 program <-> in-memory block device • Trigger crash every second • Small drop in performance • Read: 2062 MB/s (restart), 2164 MB/s (without restart) • Writes: 356 MB/s (restart), 423 MB/s (without restart)

27 Conclusion

• Heap isolation, exchangeable types, ownership tracking, interface validation, cross-domain call proxying • Provides a collection of mechanisms for enabling isolation • A step forward in enabling future system architectures • Secure kernel extensions, fine-grained access control, transparent recovery etc.

Source code

https://mars-research.github.io/redleaf/

28 Thank you!

28