Rust's Cell<T>

Cell<T> is a type in Rust’s standard library std::cell::Cell that provides interior mutability—a technique that allows us to change data even when we only have an immutable reference &T to its container. It does so while preserving Rust’s strict safety guarantees.

How Cell Achieves Interior Mutability

Cell<T> implements interior mutability by moving values in and out of the cell. In particular:

  • No Direct Mutable References: We can never obtain an &mut T to the inner value.
  • Value Extraction by Replacement: The value inside a Cell cannot be directly extracted; it must be replaced with another value.

These constraints ensure that there is never more than one reference to the inner value at any given time.

Provided Methods

For all types T, Cell<T> offers:

  • replace: Replaces the current interior value with a new one and returns the old value.
  • into_inner: Consumes the Cell<T> and returns the inner value.
  • set: Replaces the interior value (dropping the old one). For types that implement Copy:
  • get: Retrieves the current value by copying it. For types that implement Default:
  • take: Replaces the current value with Default::default() and returns the old value.

When to Use Cell

  1. Mutable Fields in Immutable Structs We can have a field that can change even when the struct is not declared as mutable:
use std::cell::Cell;

struct Counter {
    count: Cell<u32>,
}

impl Counter {
    fn increment(&self) {
        self.count.set(self.count.get() + 1);
    }
}
  1. Thread-Local Data Since Cell<T> is not thread-safe (i.e., it does not implement Sync), it’s ideal for single-threaded scenarios.
  2. Optimizing Performance Use it for small, copyable data (e.g., integers or booleans) to avoid runtime borrow checks. For larger or non- Copy types, we should consider using RefCell.
  3. No Runtime Overhead: Cell enforces safety at compile time, so there are no runtime checks (unlike RefCell<T>).

Limitations

  1. No Direct References: We cannot obtain a direct reference ( &T or &mut T) to the inner data. Instead, we must work with methods like get, set, replace, or take.
  2. Not Thread-Safe: Since Cell<T> is designed for single-threaded usage, we must use other synchronization primitives (e.g., Mutex or atomic types) for cross-thread shared mutability.

Examples

With Copy Types:

use std::cell::Cell;
struct MyStruct {
	x: i32,
	y: Cell<i32>,
}

let s = MyStruct {
	x: 42,
	y: Cell::new(42),
};

s.y.set(100); // Mutation allowed, even though `s` isn't declared as `mut`    
assert_eq!(s.y.get(), 100);
    // s.x = 100; // Error: cannot assign to `s.x`, as `s` is not declared as `mut

With Non-Copy Types:

use std::cell::Cell;
let cell = Cell::new(String::from("Hello"));
// As `String` is not `Copy`, we can't get the value out of the cell with `get`
// For non-copy types, we can use `replace` to swap the value with a new one and get the old value
let old = cell.replace(String::from("World"));
assert_eq!(old, "Hello");
// Or we can use `take` to get the value and replace it with the default value
let old = cell.take();
assert_eq!(old, "World");
// We can also use `into_inner` to get the value and consume the cell and as we used `take` before, the value should be the default value        
assert_eq!(cell.into_inner(), "");

How It Works

  • Cell internally uses UnsafeCell (the backbone of interior mutability in Rust) but provides a safe API.
  • It ensures safety by:
    1. Disallowing references to the inner value (prevents aliasing).
    2. Allowing mutation only through methods that copy or replace the entire value.

UnsafeCell Explained Simply

is the low-level building block for Rust’s interior mutability. It allows us to mutate data even through an immutable reference ( &T), but it requires manual safety guarantees from us. Think of it as a "backdoor" to bypass Rust’s default borrowing rules, with the caveat that we are responsible for ensuring safety.

The get_mut Method in Cell

Cell<T> in Rust provides a method called get_mut(&mut self) -> &mut T, which allows obtaining a mutable reference to the inner value. This method requires a mutable reference to the Cell itself (&mut self) for several reasons:

  1. Exclusive Access: By requiring &mut self, Rust ensures that we have exclusive access to the cell, meaning no other references (mutable or immutable) exist while we're modifying it through get_mut. This aligns with Rust's borrow checker rules, preventing data races or inconsistencies.
  2. Compile-Time Safety: The requirement of &mut self allows the compiler to enforce safety guarantees at compile time. It ensures that there are no other references alive when we call get_mut, which is crucial because shared references assume immutability during their lifetime. The primary purpose of Cell is interior mutability without holding onto references, typically by moving or copying values in and out. Methods like set and replace modify the cell without providing direct access through immutable references. The presence of get_mut facilitates scenarios where unique access is guaranteed, making it easier to work with cells when needed. So, Cell::get_mut takes &mut self, meaning it can only be called when we have a &mut Cell reference, which is a bit atypical.

Custom Cell Implementation

The following implementation provides a minimal version of interior mutability similar to std::cell::Cell by leveraging UnsafeCell<T>.

use std::cell::UnsafeCell;

struct MyCell<T> {
    value: UnsafeCell<T>,
}

impl<T> MyCell<T> {
    fn new(value: T) -> MyCell<T> {
        Self {
            value: UnsafeCell::new(value),
        }
    }

    fn set_direct(&self, value: T) {
        // SAFETY: MyCell is not Sync, so no other thread can access this value.
        // Also, because we never expose references to the inner value, this direct
        // mutation is safe.
        unsafe {
            // we get a raw pointer to the inner value
            let ptr = self.value.get();
            // we can dereference the pointer and change the value
            *ptr = value;
        }
    }

    fn set(&self, value: T) {
        self.replace(value);
    }

    fn replace(&self, value: T) -> T {
        // SAFETY: MyCell is not Sync, so no other thread can access this value.
        // Also, because we never expose references to the inner value, this replacement is safe.
        std::mem::replace(unsafe { &mut *self.value.get() }, value)
    }

    fn into_inner(self) -> T {
        self.value.into_inner()
    }
}

impl<T: Copy> MyCell<T> {
    fn get(&self) -> T {
        // SAFETY: MyCell is not Sync, so we can assume that no other thread is accessing the value
        // and as we are using Copy types, we can safely return the value because it's a copy not a reference
        unsafe { *self.value.get() }
    }
}

impl<T: Default> MyCell<T> {
    fn take(&self) -> T {
        self.replace(Default::default())
    }
}
  • Thread Safety: By not implementing Sync, we enforce single-threaded usage, mirroring Cell<T>'s behavior.
  • No Direct References: Never expose &T or &mut T from MyCell.
  • Copy Restriction for get: By restricting get() to Copy types and avoiding reference leaks, we sidestep Rust’s usual borrowing rules. Mutation is safe because there’s no way to observe invalid states via references.
  • No Runtime Overhead: Like Cell<T>, this implementation avoids runtime checks (e.g., no borrow counters like RefCell).

How do set_direct and get work?

Both methods use raw pointers under the hood to access and manipulate the data stored in the UnsafeCell. The key idea is that the raw pointer gives us direct access to the memory where the value of type T is stored. Obtaining the Raw Pointer: Both set_direct and get begin by calling self.value.get(). This method returns a raw mutable pointer ( *mut T) to the inner data.

  • Representation: The raw pointer is essentially just a memory address (or a fat pointer with extra metadata for unsized types), meaning it directly points to the location in memory where the value is stored.

set_direct Method:

  • In set_direct, after obtaining the raw pointer, we dereference it using *ptr. This operation accesses the memory location directly. The assignment (ptr = value;) then writes the new value into that location.
  • The old value is overwritten. If T implements Drop, its destructor will be called automatically as part of the assignment process, ensuring that the old value’s resources are properly released.

Understanding the Components

  • self.value: This is an instance of UnsafeCell<T>.
  • The get Method: Calling self.value.get() returns a raw mutable pointer of type *mut T that points to the inner value. This method is marked as safe to call, but the raw pointer it produces does not come with any of Rust’s usual aliasing or mutability guarantees. Because of that, we must use an unsafe block when dereferencing it.
  • Dereferencing with '*': The * operator is used to dereference the raw pointer. The expression *self.value.get() accesses the memory location where the inner value of the UnsafeCell is stored.

The get Method (for T: Copy):

  • Similarly, get obtains the raw pointer via self.value.get() and dereferences it with *self.value.get(). Since the method returns a value of type T and T implements Copy, the data at that memory address is copied (using the layout of T) and returned.
  • The bytes at that address represent the value of type T. Dereferencing the pointer retrieves these bytes, effectively reconstructing the value according to its type's layout so dereferencing interprets these bytes as a valid instance of T.

Why the Copy Trait Matters in get

Copy vs. Move: For types that implement the Copy trait, this dereference produces a copy of the value. That means the original value remains intact inside the cell, and we get a duplicate of it. For types that do not implement Copy, dereferencing would attempt to move the value. Moving would mean transferring ownership of the value out of the cell, which is problematic here because:

  • Ownership Violation: The value is still stored inside the cell, and moving it out would leave the cell with an uninitialized or invalid value. This violates Rust’s ownership rules, which ensure that each piece of data has a single owner at any given time. It would leave the original location in an invalid state while it’s still accessible by its original owner.
  • Potential for Double Drop: If we moved the value, when the cell is later dropped, it might try to drop an already-moved (and thus uninitialized) value, leading to undefined behavior.

So, by requiring T: Copy, we guarantee that dereferencing produces a copy (and not a move), making it safe to return the value without affecting the stored data.

Gist: https://gist.github.com/douglasmakey/2c6be5a13a7eda83aae38afc8d151650

Share Note
rocket

© 2023 KungFuDev made with love / cd 💜

Heavily inspired/copied from shuttle.rs