Unsafe in Rust: The Abstraction Safety Contract and Public Escape

After the work I did analyzing syntactic patterns that unsafe is found in, I spent the rest of my summer working on a tool to detect a particular way that unsafe may be misused. In this post and the next I’d like to explain what motivated this analysis, and provide a brief overview of how it works.

Overview

This post begins with a review how unsafe facilitates the denotation of abstraction boundaries, and proposes a set of rules for how these abstractions should work: the Abstraction Safety Contract. We then discuss a particular method by which the Abstraction Safety contract can be violated: Public Escape. We conclude by beginning to design a static analysis that can reason about Public Escape.

Unsafe as an Abstraction Boundary

Early this summer Niko Matsakis wrote about how Rust’s unsafe feature should be seen as a tool to declare and interact with abstraction boundaries that have more requirements than Rust’s type system allows them express. This perspective sheds light on the meaning of both unsafe blocks and unsafe functions, so I’ll summarize the key takeaways here while introducing a new term which will help us talk about them: safety conditions.

When a function is labelled as unsafe, some responsibility for memory safety is being deferred. The unsafe flag that means that some inputs would cause the function to execute in a memory-unsafe way. Its memory safety is contingent on additional requirements being met and callers of the function should read its documentation and make sure that the inputs satisfy those requirements. We’ll talk about these additional requirements frequently, so let’s give them a name: safety conditions.

A great example of this is the builtin slice method:

unsafe fn get_unchecked(&self, index: usize) -> &T;

The reason that this function is unsafe is given in its documentation:

Returns a pointer to the element at the given index, without doing bounds checking"

That is, the safety condition for this function is that the index is in-bounds for the slice. Any caller of this function must ensure this safety condition is met.

This is where unsafe blocks come in. When a programmer uses an unsafe block, they’re effectively saying “I understand all the unsafe operations I’m performing, I know their safety conditions, and I’ll make sure those conditions are met”. In this way, unsafe blocks take responsibility for the memory safety of the code that they contain.

An example of an unsafe block that guarantees that get_unchecked’s safety conditions are being met is this one:

fn get_from_slice<T>(slice: &[T], index: usize) -> Option<&T> {
    if index < slice.len() {
        unsafe { Some(slice.get_unchecked(index)) }
    } else {
        None
    }
}

The index < slice.len() check guarantees that when the unsafe block runs, the required safety condition (“index is in-bounds for slice”) is met.

These observations about how unsafe blocks meet safety conditions and unsafe functions defer safety conditions can help us build a precise specification for how unsafe constructs should work. Specifically, I’d like to introduce a contract for the correct use of unsafe, a sort of “Abstraction Safety Contract” which specifies how unsafe blocks and functions should interact with safety conditions:

The Abstraction Safety Contract

unsafe blocks should
- guarantee they’ll meet the safety conditions of contained code
unsafe functions should
- guarantee they’ll meet the safety conditions of contained code
  - so long as their own (documented) safety conditions are met

In contexts where the meaning is clear, we’ll abbreviate Abstraction Safety Contract as ASC.

Violations of the Abstraction Safety Contract: Escape Through Public Interfaces

Now that we have a contract for code that uses unsafe, we can start thinking about whether particular examples respect or violate this contract. In particular, we’ll explore a particular class of violations of the ASC: safety conditions escaping through public interfaces. The central motivation for thinking about public escape is that if safety conditions escape a safe public interface, then that interface cannot uphold the ASC.

To get a sense of what public escape is and see why it constitutes a violation of the ASC, we’ll look at a few examples.

Example A

pub fn deref(ptr: *const i32) -> i32 {
    unsafe {
        *ptr
    }
}

In Example A, deref is a public, safe function that takes an arbitrary pointer an dereferences it. It does so using an unsafe block, and we can reason about whether this block is being used in a way that conforms to the Abstraction Safety Contract.

Dereferencing the raw pointer inside the unsafe block creates a safety condition: “ptr must be dereferenceable”. Since the value of ptr come from the caller of deref, and deref is a public function, this critical value escapes through a public interface. For this reason, we identify this as an example of “Public Escape” - the value a safety condition depends on enters the code through a public interface. Furthermore, since deref claims to be ‘safe’, its effectively places no restrictions on how it should be called. This means that deref has failed to ensure the above safety condition is met - it has no way to ensure that ptr is dereferenceable, since it doesn’t even know ptr’s origin. Because the safety condition escapes, deref is in violation of the Abstraction Safety Contract¹.

Example B

/// Dereferences the input `ptr`.
/// NB: `ptr` _must_ be valid for this function to be memory safe!
unsafe fn deref_helper(ptr: *const i32) -> i32 {
    *ptr2
}
fn identity(ptr: *const i32) -> *const i32 {
    ptr
}
pub fn deref(ptr: *const i32) -> i32 {
    unsafe {
        let ptr2 = identity(ptr);
        deref_helper(ptr2)
    }
}

Example B is essentially just a re-factoring of Example A:

the unsafe dereference has been pulled out into a private, unsafe function (deref_helper)
an identity function (identity) has been inserted, copying ptr into ptr2

Doing this refactoring shouldn’t make deref any safer, so we hope that this should still qualify as a violation of the Abstraction Safety Contract. To see that it does we note that

the safety condition for deref_helper is that ptr2 be dereferenceable
identity is used to set ptr2 to ptr, so ptr must also be dereferenceable
ptr, as an argument to the public deref, escapes through a public interface.

So once again the value a safety condition depends on escapes through a public interface, making it impossible to guarantee that the value in question will satisfy the safety condition. For that reason, this function too is failing to uphold the Abstraction Safety Contract.

Example C

Now lets consider a negative example: a case where a safety condition doesn’t escape.

pub fn identity(x: i32) -> i32 {
    let p = &x as *const i32;
    unsafe {
        *p
    }
}

In this function a raw pointer is dereferenced, creating a safety condition: “p must be dereferenceable”. However, immediately above p is set to the address of a local variable. For this reason, the safety condition does not escape the function, and furthermore we’d agree that while identity is rather bizarre, it does indeed uphold the Abstraction Safety Contract (even though we could write examples of programs that avoid public escape but still violate the ASC!).

Example D

Now we consider a final example, inspired by the intermezzOS teaching operating system. This code features memory-mapping: the fact that the pattern by which a CPU can send information to specific hardware (e.g. the VGA buffer) by writing to specified addresses (e.g. those starting at 0xb8000).

pub struct VGA {
    vga_mem_start: *mut u8,
    // ...
}

impl VGA {
    pub fn new() -> VGA {
        VGA {
            vga_mem_start: 0xb8000 as *mut u8,
            // ...
        }
    }
    pub fn write_first(&self, byte: u8) {
        unsafe {
            *(self.vga_mem_start) = byte;
        }
    }
}

At first glance, the write_first function seems problematic for the same reason that deref was in Examples A and B: it is a public, safe function that dereferences some part of its input (specifically self.vga_mem_start). However, this case is also somewhat different — write_first isn’t just dereferencing any input, it’s dereferencing a private field of VGA. Since the field is private, the fact that the safety condition “self.vga_mem_start holds a write-able address” escapes a public function doesn’t immediately mean that the safety condition has escaped the public interface.

To determine whether this condition has escaped the public interface, we’re forced to consider all public code that might write to VGA’s vga_mem_start field. If it turns out that the only such code is in VGA::new(), which puts a magic address into the field, then we know that this safety condition doesn’t escape the public interface.

Of course, we haven’t verified that the safety condition is actually met — we don’t know whether the address 0xb8000 can actually be written to, and we definitely don’t know that the address gives the correct location of the VGA Buffer. All we can say is that no safety conditions escape the public interface of VGA.

This asymmetry is important: If safety conditions escape a public interface then there must be a violation of the Abstraction Safety Contract, but a lack of escape does not prove that the ASC is upheld.

Hopefully by now you have some intuition for what public escape is, and have a good understanding of why it’s problematic for safety conditions to escape. In the next section we’re going to start designing an automatic tool that can reason about public escape, which should help further solidify these ideas.

Automating the Analysis

To reason automatically about public escape we’re going to need to break out a fairly powerful tool: dataflow analysis. If you’re interested in learning about dataflow analysis in detail, I’d recommend this course’s lectures, but I’ve also included a basic introduction that should be enough to understand this post.

Dataflow Analysis: An Introduction

I find examples helpful in building intuition for new ideas, so let’s begin with a question that dataflow analysis could be used to answer: “Which variables in a function are equal to 5?”. Consider this function:

fn my_func(flag: bool) {
    let x = 5;
    let y = 6;
    let z = x;
    let mut a = 0;
    if flag {
        a = x
    } else {
        a = y
    }
    // Rest of the function
}

The central technique of dataflow analysis is to step through a program, determining which facts are true at each point. At branches in control flow the analysis is often unable to tell which branch occurs, so both are explored and the resulting facts from each must be reconciled afterwards.

In the case of determining which variables hold the value 5, a simple analysis would reason about facts such as “var {must,must not,may} be 5”, and would use rules such as:

let var = 5 means “var must be 5”
let var = /* other literal here */ means that “var must not be 5”
let var1 = var2 means that if “var2 {must,must not,may} be 5” then “var1 {must,must not,may} be 5”
At points where control flows merge (e.g. the end of an if/else construct), the analyses from the two flows must also be merged. If one branch says “var1 must be 5”, and the other says “var1 must not be 5”, then we must conclude that “var1 may be 5”, since we can’t tell which branch would actually run.

As the analysis move forward through my_func, it would determine the following:

fn my_func(flag: bool) {
    let x = 5;     // x must be 5
    let y = 6;     // x must be 5, y must not be 5
    let z = x;     // x must be 5, z must be 5, y must not be 5
    let mut a = 0; // x must be 5, z must be 5, y must not be 5, a must not be 5
    if flag {
        a = x      // x must be 5, z must be 5, y must not be 5, a must be 5
    } else {
        a = y      // x must be 5, z must be 5, y must not be 5, a must not be 5
    }              // Merge: (a must be 5) + (a must not be 5) = (a may be 5)

                   // x must be 5, z must be 5, y must not be 5, a may be 5

    // Rest of the function
}

Now, knowing whether an integer variable is 5 may not seem particularly useful, but this analysis could be generalized to do constant evaluation. In particular, imagine if we could determine whether boolean variables were true or false: this could enable a compiler to determine which arm of an if/else gets executed and omit the unused arm!

Dataflow analyses (and similar static analyses) are also used for many other purposes: they can determine which variables are live, which pointers alias each other, and even how long borrows should last (in Rust).

Dataflow Analysis: Public Escape

As hinted at, we can also use dataflow analysis to detect the public escape of safety conditions. To do this we’ll treat safety conditions as our facts, and reason about how they flow through a program. The trick is, this analysis is actually a backward analysis: safety conditions are created by unsafe operations and must be satisfied before those operations (not after).

To get a sense of how such an analysis would work, I’ll sketch a few of the rules it would use, and then we’ll look at how the analysis handles Examples A and B from earlier.

*my_raw_ptr means that “my_raw_ptr must be deref-able” (given that my_raw_ptr is a raw pointer).
let var1 = var2 means that any conditions that must hold for var1 must hold for var2 and var1 no longer needs to satisfy any conditions².
let var1 = &var2 means that the condition “var1 must be deref-able” does not flow any further (or escape)³.

Example A

When we use those rules to reason about Example A, our analysis determines the following (read the comments from bottom to top, since this is a backward analysis):

pub fn deref(ptr: *const i32) -> i32 {
             // ^ ptr must be deref-able
    unsafe {
             // ptr must be deref-able
        *ptr
             // (no conditions)
    }
}

The analysis determines that when the function begins the argument ptr must be dereferenceable. Because there are safety conditions involving deref’s arguments we say that these conditions escape the function. Furthermore, since deref is a safe public function, this function escape also constitutes a violation of the Abstraction Safety Contract.

Example B

Recall example B (documentation omitted):

unsafe fn deref_helper(ptr: *const i32) -> i32 {
    *ptr2
}
fn identity(ptr: *const i32) -> *const i32 {
    ptr
}
pub fn deref(ptr: *const i32) -> i32 {
    unsafe {
        let ptr2 = identity(ptr);
        deref_helper(ptr2)
    }
}

In using a dataflow analysis to understand whether any safety conditions escape the public function deref, we’re forced to do an inter-procedural analysis: an analysis that can reason about the significance of one function calling another. Let’s step through how analysis might do this, starting with the end of deref.

The last operation that deref performs is calling deref_helper on ptr2. Our analysis would thus have to analyze deref_helper at that point, and determine any safety conditions created by calling it. That analysis would conclude that the first argument to deref_helper must be deref-able, a condition that would be brought back into deref’s analysis:

pub fn deref(ptr: *const i32) -> i32 {
    unsafe {
        let ptr2 = identity(ptr);
                                   // ptr2 must be deref-able
        deref_helper(ptr2)
                                   // (no conditions)
    }
}

The analysis would then encounter the call to identity, and assignment of identity’s result to ptr2. We might be tempted to say that identity should then be analyzed, just as deref_helper was, but we have to be careful - we can’t just analyze identity in isolation - we have to analyze it with some extra information from the site where it is called (in this case, the knowledge that identity’s return value must be dereferenceable). This extra information is the context in which identity in analyzed, and by handling contexts our analysis models how data can enter a function through its arguments and leave through its return value⁴.

When identity is analyzed in the context in which its return value must be dereferenceable, the result is that its first argument must also be dereferenceable. Bringing this conclusion back into our analysis of deref and finishing that analysis yields the following:

pub fn deref(ptr: *const i32) -> i32 {
                                   // ^ ptr must be deref-able
    unsafe {
                                   // ptr must be deref-able
        let ptr2 = identity(ptr);
                                   // ptr2 must be deref-able
        deref_helper(ptr2)
                                   // (no conditions)
    }
}

As in Example A, the analysis finds that deref requires that its first argument must be dereferenceable — a safety condition has escaped the function. Since deref is both public and safe this escape is also an instance of a safety condition escaping a public interface and indicates a violation of the Abstraction Safety Contract.

Example C

Now lets consider how our analysis handles an example with no public escape. Remember to read the comment backwards, since the analysis is a backward one!

pub fn identity(x: i32) -> i32 {
                              // ^ (no conditions)
    let p = &x as *const i32;
                              // (p must be deref-able)
    unsafe {
                              // (p must be deref-able)
        *p
                              // (no conditions)
    }
}

While a condition “p must be dereferenceable” is still generated, that condition is ultimately killed⁵ by the assignment let p = &.... For that reason the analysis would conclude that this function features no public escape, which is correct.

Next Time: Example D and Beyond

We’ve seen how a public escape analysis might work on a few functions, but Rust is a complex language and there is much more to do. In particular, we need:

to reason about the private fields of structs, as found in Example D.
to understand that different fields of structs hold different values.
to understand that p and *p refer to different locations/values
to reason about calls to functions that we don’t have the source for (perhaps in external dependencies or boxed closures)
to reason about issues raised by aliasing (what if x and *p denote the same memory location)
to make sure that recursive functions and other examples of complex dependencies between functions are handled correctly.

In my next post I’ll touch on some of these, and (hopefully) include the results of running the analysis on various Rust codebases.

Technically we might say that the unsafe block inside of deref is violating the ASC, but since it is deref that sets up the state of the program when the unsafe block runs, it’s convenient to “hold deref accountable”. ↩︎
The reason that let var1 = var2 frees var1 of all conditions is that is means the value of var1 will be overwritten by var2’s value. Once we start dealing properly with pointers, arrays, and all the complexities of real programs we’ll have to be much more careful about whether lval1 = lval2 really overwrites our representation of lval1 and whether our representation of lval1 might alias some other lvalue representation involved with safety conditions. For now a simple understanding makes for a smooth exposition. ↩︎
Note that this doesn’t guarantee that the condition is met - just that it doesn’t escape. ↩︎
A function can also cause externally-visible dataflow by writing to lvalues pointed at by its arguments. Consider fn swap<T>(a: &mut T, b: &mut T);. Nevertheless, thinking about the return value is a good way to start. ↩︎
‘Killed’ is the technical term for removing facts from a datflow. ↩︎