Avoiding benchmarking pitfalls with std::hint::black_box
November 2022 - RSS

When benchmarking short programs, you often encounter two big problems that mess up your final results: (1) hardware and operating systems are full of side-effects that are neither transparent nor directly manipulable and (2) compilers can optimize in unpredictable ways, requiring IR/Assembly inspection and knowledge of compiler intrinsics.

One such example happened while I was benchmarking a multithreaded queue. I chose my struct alignments in a way that would reduce cache coherency traffic, which should translate to a noticeable improvement in per-thread throughput on write-heavy workloads. However, I measured the exact opposite! This is an excerpt of the code, where we essentially just increment a variable:
for i in 0..0xffff {
    //...
    *head = (*head + 1) & ((1 << C) - 1);
}
We're dereferencing a *mut, incrementing the value, and storing it at the same address. We should expect to see at least an ldr and str instruction. Generating the corresponding arm64 assembly with RUSTFLAGS="--emit asm" cargo bench --no-run yields:
LBB0_1:
        add     w9, w9, #1
        and     w9, w9, #0xffff
        subs    x10, x10, #1
        b.ne    LBB0_1
        str     w9, [x8]
Sneaky! The last line reveals the problem with my benchmark. The changed values are written back to main memory only once after the loop has finished, which means there is barely any cache coherency traffic happening! Honestly, I would've expected llvm to just unroll the loop and add 65536 directly to [x8], but to my surprise it keeps the loop around. We can fix this in native Rust by using a compiler hint called std::hint::black_box:
for i in 0..0xffff {
    //...
    *head = (*head + 1) & ((1 << C) - 1);
    black_box(head);
}
According to the official Rust docs, black_box is "an identity function that hints to the compiler to be maximally pessimistic about what black_box could do". Ideally, this function should be interpreted as a side-effect by the compiler - meaning that the function alters the observable behavior of the program. The compiler should then not be allowed to elide and defer str instructions across our loop iterations like we've seen before. And to our satisfaction, after compiling the black_box variant, we find our missing load and store instructions:
LBB0_1:
        ldr     x11, [sp]
        ldr     w12, [x11]
        add     w12, w12, #1
        and     w12, w12, #0xffff
        str     w12, [x11]
        str     x9, [sp, #24]
        subs    x8, x8, #1
        b.ne    LBB0_1
But what goes on under the hood? Being an unstable feature, the implementation of black_box has undergone several changes. The most recent one, which is likely going to be stabilized, is implemented as a rustc compiler intrinsic. If we follow the definition of copy_op, we'll arrive at this function. As long as the passed in object is small enough, we can write a scalar value to the same memory location:
pub fn write_scalar(&mut self, range: AllocRange, val: Scalar<Prov>) -> ... {
    let range = self.range.subrange(range);
    Ok(self
        .alloc
        .write_scalar(&self.tcx, range, val)
        .map_err(|e| e.to_interp_error(self.alloc_id))?)
}
This also explains the second str instruction in our black_box variant. There are some things to keep in mind though:
  • Make sure to pass large objects via &mut T. If you pass them by value, you will end up with a memcpy even in optimized builds.
  • black_box does not guarantee anything, and only works as an advisory function. It's not a llvm intrinsic. So manual inspection of IR or assembly is still necessary.
  • black_box is still experimental and awaits stabilization, part of which is possibly a name and documentation change. (stabilized as of 2022-12-15)
  • Creating a version of black_box that gives strict guarantees would require a top-to-bottom rework, including patching backends to support these intrinsics.
You can find an interesting discussion about this function in the tracking issue on GitHub.
Update (December): With the release of Rust 1.66, black_box has been officially stabilized. You can find more information on the official announcement post.