Avoiding benchmarking pitfalls with std::hint::black_box
When benchmarking short programs, you often encounter two big problems that mess up your final results: (1) hardware and operating systems are full of side-effects that are neither transparent nor directly manipulable and (2) compilers can optimize in unpredictable ways, requiring IR/Assembly inspection and knowledge of compiler intrinsics.
One such example happened while I was benchmarking a multithreaded queue. I chose my struct alignments in a way that would reduce cache coherency traffic, which should translate to a noticeable improvement in per-thread throughput on write-heavy workloads. However, I measured the exact opposite! This is an excerpt of the code, where we essentially just increment a variable:for i in 0..0xffff {
//...
*head = (*head + 1) & ((1 << C) - 1);
}
LBB0_1:
add w9, w9, #1
and w9, w9, #0xffff
subs x10, x10, #1
b.ne LBB0_1
str w9, [x8]
for i in 0..0xffff {
//...
*head = (*head + 1) & ((1 << C) - 1);
black_box(head);
}
LBB0_1:
ldr x11, [sp]
ldr w12, [x11]
add w12, w12, #1
and w12, w12, #0xffff
str w12, [x11]
str x9, [sp, #24]
subs x8, x8, #1
b.ne LBB0_1
pub fn write_scalar(&mut self, range: AllocRange, val: Scalar<Prov>) -> ... {
let range = self.range.subrange(range);
Ok(self
.alloc
.write_scalar(&self.tcx, range, val)
.map_err(|e| e.to_interp_error(self.alloc_id))?)
}
- Make sure to pass large objects via &mut T. If you pass them by value, you will end up with a memcpy even in optimized builds.
- black_box does not guarantee anything, and only works as an advisory function. It's not a llvm intrinsic. So manual inspection of IR or assembly is still necessary.
black_box is still experimental and awaits stabilization, part of which is possibly a name and documentation change.(stabilized as of 2022-12-15)- Creating a version of black_box that gives strict guarantees would require a top-to-bottom rework, including patching backends to support these intrinsics.
Update (December): With the release of Rust 1.66, black_box
has been officially stabilized. You can find more information on the
official announcement post.