Unsigned overflow wraps. That’s the contract. C++ won’t catch it, the CPU won’t trap it. But your size calculation that wraps to a smaller buffer? That’s a memory corruption vulnerability, and it’s yours to own.
The obvious defense—check before you multiply—works until a compiler optimizer gets clever. I’ve watched codebases strip overflow guards down to nothing because the optimizer reasoned: “this multiply can’t overflow, so the check must be dead code.” You write the protection. The compiler removes it. The bug ships.
What you need is an overflow check sophisticated enough that modern optimizers recognize it as intentional and stop trying to simplify it. Daniel Lemire’s division-based check does exactly that. Wrap it in constexpr, and you shift the cost to compile-time for constant sizes. Compile-time constants drop to two instructions: load and return. Runtime paths cost one comparison. I verified this on GCC 15.2.1 and Clang 21.1.8 with assembly inspection on x86-64.
The problem is subtler than it looks
Every buffer allocation is a multiplication. width * height * channels * sizeof(pixel). If any intermediate step wraps, the final buffer is undersized. Memory corruption. That’s the vulnerability.
Containers do this too. Allocators multiply element size and count. Matrix classes compute rows * cols * sizeof(element). Any place where dimensions meet sizeof, overflow is a threat.
Here’s where it gets nasty: unsigned overflow is defined in C++. It wraps. That’s safer than signed integer overflow (which is UB), but it creates false confidence. The compiler reads “this multiplication is defined” and assumes it can’t overflow—because if it did, you’d be violating the implicit contract that defined operations hold. So it optimizes away overflow checks.
You write one. The compiler sees it as dead code and removes it. Your protection vanishes. I’ve seen this pattern: check before multiply, multiply, check the result. An optimizer reasons: “the multiply can’t overflow because this programmer checked earlier.” Check deleted. Bug ships.
Lemire’s insight
Daniel Lemire found a mathematical property that detects overflow without ever computing the wrapped product itself. For unsigned multiplication a * x with constant a, overflow happens if and only if (a * x) / a != x.
When there’s no overflow, multiplying then dividing recovers the original value. When overflow occurs, the product wraps smaller. Dividing that wrapped value by a gives a result strictly less than x. For unsigned types, division rounds down. So the recovered value is definitely not equal to the original x.
It’s elegant. It’s mathematically proven. And it avoids computing the actual product—you compute the recovery operation and compare the intermediate result. The inequality acts as a proxy for overflow without needing to know what the wrapped value actually is.
Here’s the check:
template<uint64_t a>
constexpr bool lemire_overflow_check(uint64_t x) {
if (x == 0) return false;
if (a == 0) return false;
return (a * x) / a != x; // Lemire's check
}
// Compile-time verification
static_assert(!lemire_overflow_check<8>(100), "8*100 safe");
static_assert(lemire_overflow_check<8>(UINT64_MAX / 7), "8*(MAX/7) overflows");
The mathematics are sound. Implementation is where the surprise lives. Division on x86-64 is expensive—20+ cycle latency. A naive compilation would add serious overhead. Do modern optimizers recognize the pattern and collapse it?
On GCC 15.2.1 and Clang 21.1.8 with -O2 on x86-64, both compilers optimize the division-based check to identical assembly: a shift and a comparison, two instructions total.
Here’s the mechanism. The compiler recognizes that (a * x) / a != x checks whether x exceeds UINT64_MAX / a. It replaces the expensive division with a cheap shift (power-of-2 division becomes a right shift), then folds the constant threshold. The hand-written form—x > UINT64_MAX / a—generates the same machine code: shrq and setne.
That’s sophisticated pattern matching. The optimizer must see through the division, recognize it as a threshold check, and replace it with the minimal form, then fold the constant. Both GCC and Clang do this reliably—I’ve verified it on real binaries.
Not all compilers fuse the pattern. Go’s compiler retains the actual division, which is expensive. On x86-64 with GCC 15 and Clang 21, you can rely on the optimization.
Where it gets free: constexpr
Wrapping the check in constexpr shifts the entire cost to compile-time when sizes are constants. The compiler evaluates everything at compile-time and embeds the result as a literal. No runtime cost. No branches. Just a number.
template<uint64_t element_size, uint64_t count>
struct SafeAllocation {
// Evaluate overflow check at compile-time
static constexpr bool overflows =
(element_size * count) / count != element_size;
static constexpr uint64_t size =
overflows ? 0 : element_size * count;
};
// Usage: compute allocation size at compile-time
struct CompileTimeBuffer {
static constexpr auto small = SafeAllocation<16, 10000>{};
static_assert(!small.overflows);
static_assert(small.size == 160000);
};
The compiled code is two instructions:
allocate_buffer():
movabsq $256000, %rax # Load constant
ret # Return
No multiplication, division, or check. The entire SafeAllocation computation—overflow verification included—happens at compile-time and bakes into the binary as a literal. For constant-size allocations, cost is zero.
Runtime variables are different. Cost drops to one comparison and a conditional set:
check_overflow_runtime(unsigned long):
cmpq $2305843009213693951, %rdi # Compare to MAX/256
seta %al # Set if above threshold
ret
With prologue/epilogue, that’s the minimum for a 64-bit multiply with overflow detection.
Runtime checks still cost almost nothing
Runtime input optimization is automatic. Integer division is expensive—20+ cycle latency—but the optimizer swaps it for a shift.
I tested three variants on GCC 15.2.1 with -O2:
x > UINT64_MAX / 8(direct comparison)(8 * x) / 8 != x(Lemire’s form)multiply_safe<8>(x)(constexpr wrapper)
All three generate 2-instruction assembly. The compiler recognizes (a * x) / a as division by a power of 2, replaces it with a shift, sees that comparing the shifted result to x equals a threshold check, and collapses to shift + comparison.
Clang 21.1.8 is identical. Both toolchains normalize Lemire’s form to minimal code. You get elegance and performance free. The optimizer pattern-matches through the algebra.
Building a production library
You need a single utility that works at compile-time and runtime, handles errors cleanly, and costs nothing for compile-time constants. Careful template design delivers this.
namespace safe {
// Overflow check, constexpr-evaluable for all paths
template<uint64_t multiplier>
constexpr bool will_overflow(uint64_t x) {
if (multiplier == 0 || x == 0) return false;
return (multiplier * x) / x != multiplier;
}
// Returns true if safe, false if overflow
template<uint64_t multiplier>
constexpr bool multiply_safe(uint64_t x) {
return !will_overflow<multiplier>(x);
}
// At compile-time: zero cost (constant-folded)
// At runtime: single comparison instruction
template<uint64_t multiplier>
constexpr uint64_t multiply_or_zero(uint64_t x) {
return will_overflow<multiplier>(x) ? 0 : multiplier * x;
}
}
// Compile-time usage: zero cost
constexpr uint64_t buf_size = safe::multiply_or_zero<256>(1024);
static_assert(buf_size == 262144);
// Runtime usage: minimal cost
uint64_t user_input = ...;
uint64_t alloc_size = safe::multiply_or_zero<16>(user_input);
if (alloc_size == 0) {
// Handle overflow
}
Make the multiplier a template parameter. The compiler evaluates everything at compile-time when the multiplier is constant (which it always is in a template) and optimizes runtime paths where only input varies. Each instantiation gets specialized code without dispatch overhead.
Error handling: reach for std::expected (C++23). The error type adds zero codegen overhead.
struct OverflowError {
const char* message = "Multiplication overflow";
};
template<uint64_t multiplier>
std::expected<uint64_t, OverflowError>
safe_multiply(uint64_t x) {
if ((multiplier * x) / x != multiplier) {
return std::unexpected(OverflowError{"overflow"});
}
return multiplier * x;
}
// Usage
auto result = safe_multiply<8>(user_size);
if (result) {
allocate_buffer(result.value());
} else {
log_error(result.error().message);
}
GCC 15.2.1 and Clang 21.1.8 generate identical assembly for expected<uint64_t, OverflowError> and a hand-rolled version without error handling. The error type is a true zero-cost abstraction; the compiler eliminates all overhead. You still get minimal comparison and branch.
Compiler quirks you need to know
MSVC’s optimizer is conservative. It may not fuse Lemire’s pattern to a simple comparison; you might get more instructions and an actual division. I couldn’t test (no MSVC on Linux), so use Compiler Explorer (Godbolt) to verify before shipping overflow checks in hot paths. Lemire’s blog has Go users reporting the same issue—pattern doesn’t fuse, division stays expensive.
ARM is different. The instruction set differs; CLZ (count-leading-zeros) exists on ARM but not x86, opening different optimization paths. The division-based check should still be competitive, but verify assembly if ARM is a target. x86 codegen patterns don’t port automatically.
Modern C++ integration
C++20 concepts enforce type safety. Constrain safe-multiply to unsigned integral types:
#include <concepts>
template<std::unsigned_integral T, T a>
requires (a > 0)
constexpr bool safe_multiply(T x) {
return (a * x) / x != a;
}
// Compile-time enforcement
// safe_multiply<signed int, 8>(x); // Error: not unsigned_integral
// safe_multiply<uint64_t, 0>(x); // Error: violates requires (a > 0)
The compiler enforces type and value constraints at compile-time. Call it with a signed type or zero multiplier: compile-time error. Whole categories of misuse vanish.
C++23 gave us std::expected, integrating cleanly with overflow checks. C++26 might formalize safe integer operations. Until then, Lemire’s pattern wrapped in constexpr is portable across GCC 15 and Clang 21.
Lemire’s division-based check is sound. Compilers optimize it well on x86-64. Wrapped in constexpr, it costs zero at compile-time and minimal cost at runtime. std::expected adds no codegen overhead.
Here’s the catch: you must verify assembly on your target compiler and architecture. MSVC doesn’t fuse the pattern; division stays expensive. ARM codegen differs due to instruction set. -O2 is required; -O0 may not optimize. Volatile inputs defeat constant-folding. The burden isn’t the check itself—it’s verification. You cannot assume codegen. You must verify it.
For systems code—buffer allocations, matrix dimensions, any multiplication that might overflow—it matters. Compile-time checks for constants eliminate a whole class of vulnerabilities for free: two instructions (load and return). Runtime checks for dynamic inputs cost one comparison and one conditional set. That’s negligible compared to unsafe functions or eating the risk.
Lemire’s insight is elegant. The optimization works on GCC 15 and Clang 21. Your job: verify assembly on your compiler and architecture, document constraints (compiler version, -O2, x86-64), and wire it into allocators. For constant sizes, cost is zero. For runtime dimensions, it’s negligible.