Region-Based Memory Management
Total Page:16
File Type:pdf, Size:1020Kb
Region-based Memory Management Advanced Operating Systems Lecture 10 Lecture Outline • Rationale • Stack-based memory management • Region-based memory management • Ownership • Borrowing • Benefits and limitations 2 Rationale • Garbage collection tends to have unpredictable timing and high memory overhead • Real-time collectors exist, but are uncommon and have significant design implications for applications using them • Manual memory management is too error prone • Region-based memory management aims for a middle ground between the two approaches • Safe, predictable timing • Limited impact on application design 3 Stack-based Memory Management • Automatic allocation/deallocation of variables on the stack is common and efficient: #include <math.h> #include <stdio.h> #include <stdlib.h> static double pi = 3.14159; Global variables static double conic_area(double w, double h) { double pi = 3.14159 double r = w / 2.0; double a = pi * r * (r + sqrt(h*h + r*r)); Stack frame for main() return a; double width = … } double height = … double area = … int main() { double width = 3; Stack frame for conic_area() double height = 2; double area = conic_area(width, height); double w = … double h = … double r = … printf("area of cone = %f\n", area); double a = … return 0; } 4 Stack-based Memory Management • Hierarchy of regions corresponding to call stack: • Global variables • Local variables in each function • Lexically scoped variables within functions double vector_avg(double *vec, int len) { double sum = 0; Lifetime of sum – local variable in function for (int i = 0; i < len; i++) { Lifetime of i – lexically scoped sum += vec[i]; } return sum / len; } • Variables live within regions, and are deallocated at end of region scope 5 Stack-based Memory Management • Limitation: requires data to be allocated on stack • Example: int hostname_matches(char *requested, char *host, char *domain) { char *tmp = malloc(strlen(host) + strlen(domain) + 2); sprintf(tmp, “%s.%s”, host, domain); if (strcmp(requested, host) == 0) { return 1; } if (strcmp(requested, tmp) == 0) { return 1; } return 0; } The local variable tmp (pointer of type char *) is freed when the function returns; the allocated memory is not freed • Heap storage has to be managed manually 6 Region-based Memory Allocation • Stack allocation effective within a narrow domain – can we extend the ideas to manage the heap? • Create pointers to heap allocated memory • Pointers are stored on the stack and have lifetime matching the stack frame – pointers have type Box<T> for a pointer to heap allocated T • The heap allocation has lifetime matching that of the Box – when the Box goes out of scope, the heap memory it references is freed • i.e., the destructor of the Box<T> frees the heap allocated T • Efficient, but loses generality of heap allocation since ties heap allocation to stack frames 7 Region-based Memory Management • For effective region-based memory management: • Allocate objects with lifetimes corresponding to regions • Track object ownership, and changes of ownership: • What region owns each object at any time • Ownership of objects can move between regions • Deallocate objects at the end of the lifetime of their owning region • Use scoping rules to ensure objects are not referenced after deallocation • Example: the Rust programming language • Builds on previous research with Cyclone language (AT&T/Cornell) • Somewhat similar ideas in Microsoft’s Singularity operating system 8 Returning Ownership of Data • Returning data from a function causes it to outlive the region in which it was created: Lifetime of local variables const PI: f64 = 3.14159; in area_of_cone fn area_of_cone(w : f64, h : f64) -> f64 { let r = w / 2.0; let a = PI * r * (r + (h*h + r*r).sqrt()); return a; } fn main() { let width = 3.0; Lifetime of a let height = 2.0; let area = area_of_cone(width, height); println!("area = {}", area); } 9 Returning Ownership of Data • Runtime must track changes in ownership as data is returned • Copies made of stack-allocated local variables; original deallocated, copy has lifetime of stack-allocated local variable in calling function • Allows us to return a copy of a Box<T> that references a heap allocated value of type T • Effective with a reference counting implementation • Creating the new Box<T> temporarily increases the reference count on the heap-allocated T • The original box is then immediately deallocated, reducing the reference count again • (An optimised runtime can eliminate the changes to the reference count) • The heap-allocated T is deallocated when the box goes out of scope of the outer region • Allows data to be passed around, if it always has a single owner 10 Giving Ownership of Data % cat consume.rs • Ownership of parameters passed fn consume(mut x : Vec<u32>) { to a function is transferred to that x.push(1); function } Lifetime of a • Deallocated when function ends, unless it fn main() { returns the data let mut a = Vec::new(); • Data cannot be later used by the calling function – enforced at compile time a.push(1); a.push(2); Ownership of a transferred to consume() consume(a); println!("a.len() = {}", a.len()); } % rustc consume.rs consume.rs:15:28: 15:29 error: use of moved value: `a` [E0382] consume.rs:15 println!("a.len() = {}", a.len()); ^ 11 Borrowing Data % cat borrow.rs • Functions can borrow references to fn borrow(mut x : &mut Vec<u32>) { data owned by an enclosing scope x.push(1); } • Does not transfer ownership of the data A mutable reference • Naïvely safe to use, since live longer than fn main() { the function let mut a = &mut Vec::new(); a.push(1); a.push(2); • Can also return references to input parameters passed as references borrow(a); • Safe, since these references must live longer println!("a.len() = {}", a.len()); than the function } % rustc borrow.rs % ./borrow a.len() = 3 % 12 Problems with Naïve Borrowing % cat borrow.rs • The borrow() function changes fn borrow(mut x : &mut Vec<u32>) { the contents of the vector x.push(1); } • But – it cannot know whether it is fn main() { safe to do so let mut a = &mut Vec::new(); • In this example, it is safe If was iterating over the contents of a.push(1); • main() the vector, changing the contents might lead a.push(2); to elements being skipped or duplicated, or to a result to be calculated with inconsistent borrow(a); data println!("a.len() = {}", a.len()); • Known as iterator invalidation } % rustc borrow.rs % ./borrow a.len() = 3 % 13 Safe Borrowing • Rust has two kinds of pointer: • To change an object: • &T – a shared reference to an immutable • You either own the object, and it is not object of type T marked as immutable; or • &mut T – a unique reference to a mutable • You have the only &mut reference to it object of type T • Runtime system controls pointer • Prevents iterator invalidation ownership and use • The iterator requires an &T reference, so • An object of type T can be referenced by other code can’t get a mutable reference one or more references of type &T, or by to the contents to change them: exactly 1 reference of type &mut T, but fn main() { not both let mut data = vec![1, 2, 3, 4, 5, 6]; for x in &data { fails, since push takes Cannot get an reference to data data.push(2 * x); • &mut T an &mut reference of type T that is marked as immutable } } • Allows functions to safely borrow enforced at compile time objects – without needing to give away ownership 14 Benefits • Type system tracks ownership, turning run-time bugs into compile-time errors: • Prevents memory leaks and use-after-free bugs • Prevents iterator invalidation • Prevents race conditions with multiple threads – borrowing rules prevent two threads from getting references to a mutable object • Efficient run-time behaviour – timing and memory usage are predictable 15 Limitations of Region-based Systems • Can’t express cyclic data structures • E.g., can’t build a doubly linked list: Can’t get mutable reference a b c d to c to add the link to d, since already referenced by b • Many languages offer an escape hatch from the ownership rules to allow these data structures (e.g., raw pointers and unsafe in Rust) • Can’t express shared ownership of mutable data • Usually a good thing, since avoids race conditions • Rust has RefCell<T> that dynamically enforces the borrowing rules (i.e., allows upgrading a shared reference to an immutable object into a unique reference to a mutable object, if it was the only such shared reference) • Raises a run-time exception if there could be a race condition, rather than preventing it at compile time 16 Limitations of Region-based Systems • Forces programmer to consider object ownership early and explicitly • Generally good practice, but increases conceptual load early in design process – may hinder exploratory programming 17 Summary • Region-based memory management with strong Region-Based Memory Management in Cyclone ∗ Dan Grossman Greg Morrisett Trevor Jim† ownership and borrowing rules Michael Hicks Yanling Wang James Cheney Computer Science Department †AT&T Labs Research Cornell University 180 Park Avenue Ithaca, NY 14853 Florham Park, NJ 07932 danieljg,jgm,mhicks,wangyl,jcheney @cs.cornell.edu [email protected] • Efficient and predictable behaviour { } ABSTRACT control over data representation (e.g., field layout) and re- Cyclone is a type-safe programming language derived from source management (e.g., memory management). The de C. The primary design goal of Cyclone is to let program- facto language for coding such systems is C. However, in Strong correctness guarantees prevent many common bugs mers control data representation and memory management providing low-level control, C admits a wide class of danger- • without sacrificing type-safety. In this paper, we focus on ous — and extremely common — safety violations, such as the region-based memory management of Cyclone and its incorrect type casts, buffer overruns, dangling-pointer deref- static typing discipline.