C++ Profiles: What, Why, and How at using std::cpp 2026

I spent a week after using std::cpp 2026 trying to answer one question about profiles: do the checks cost anything? The proposal sounds good on paper — opt-in safety enforcement per translation unit, compiler eliminates redundant checks. But “the compiler will optimize it away” is the oldest lie in C++. So I measured.

The short version: for bounds checks inside loops with known iteration counts, GCC 15 and Clang 21 both emit identical assembly with and without the check. Not “nearly identical.” Identical. Zero instructions of overhead. The interesting part is understanding why, and where it breaks down.

But first, the bugs.

What compiles clean and shouldn’t

Here is a function I would reject in code review that both major compilers accept without comment:

int idx = atoi(argv[1]);
int arr[8] = {};
arr[idx] = 42;   // UB if idx >= 8

GCC 15 and Clang 21 are both silent at -Wall -Wextra -Wpedantic — zero warnings, because atoi’s return value is opaque to the optimizer and constant propagation never fires. Your test passes because you tested with index 3. Production takes index 10 and silently corrupts the stack.

UBSan catches it immediately:

claim-01-bounds-ub.cpp:24:12: runtime error: index 10 out of bounds for type 'int [8]'

A bounds profile would make that trap the default compilation mode. Not a sanitizer you remembered to turn on. Not a CI job that runs nightly. The default.

Type confusion: one compiler cares, one doesn’t

This is a pattern I still see in production code from people who should know better:

int read_via_c_cast(float f) {
    return *(int*)(&f);            // strict-aliasing UB
}

int read_via_bit_cast(float f) {
    return std::bit_cast<int>(f);  // defined behavior
}

Both produce 0x4048f5c3 for 3.14f. Both “work.” The first is undefined behavior that survives every platform you test on until an optimizer gets clever and breaks it.

GCC 15 has gotten better here than I expected. With -Wall (which pulls in -Wstrict-aliasing), it fires on both the C-style cast and the reinterpret_cast variant — “dereferencing type-punned pointer will break strict-aliasing rules.” Clang 21 under the same flags? Silent. Nothing.

That asymmetry is the problem a type safety profile would fix. The correctness of your aliasing code shouldn’t depend on which compiler binary happens to be in your PATH.

The dangling view that `-Wall` misses

This one is my favorite because it’s so common:

std::string_view get_view() {
    std::string s = "hello world this string is long enough to defeat SSO";
    return std::string_view(s);
    // s destroyed here — returned view dangles
}

GCC 15, Clang 21, -Wall -Wextra — zero warnings from either. The string is long enough to heap-allocate (defeats small string optimization), so the returned view points into freed memory.

GCC’s -fanalyzer does catch it, and catches it well — a 24-event trace from allocation through destruction to the use site, tagged CWE-416:

warning: use after 'delete' of 'v.std::basic_string_view<char>::_M_str' [CWE-416]
   30 |     printf("size=%zu first=%c\n", v.size(), v[0]);

AddressSanitizer confirms: heap-use-after-free at the v[0] access.

The tools exist. -fanalyzer is genuinely good at catching this. But nobody runs it by default — it’s slow, it’s opt-in, and most CI pipelines skip it. A lifetime profile would make that diagnostic a property of normal compilation instead of something you have to remember to ask for.

The assembly doesn’t lie

Every safety proposal for C++ dies on the same question: what does it cost? If bounds checks add a branch to every array access, the proposal is dead for anything latency-sensitive. Profiles bet on the optimizer: prove the check redundant, emit nothing.

I tested this directly. Two functions — identical logic, one with an explicit bounds check:

// Unchecked
int sum_raw(const int* p, int n) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
        sum += p[i];
    return sum;
}

// Checked
int sum_with_check(const int* p, int n) {
    int sum = 0;
    for (int i = 0; i < n; ++i) {
        if (__builtin_expect(i >= n, 0)) __builtin_trap();
        sum += p[i];
    }
    return sum;
}

The check i >= n inside a loop guarded by i < n is always false. The question is whether the compiler actually proves that.

On GCC 15.2.1 at -O2, the assembly is instruction-for-instruction identical:

; sum_raw                          ; sum_with_check
  testl   %esi, %esi              ;   testl   %esi, %esi
  jle     .L_exit                 ;   jle     .L_exit
  movslq  %esi, %rsi              ;   movslq  %esi, %rsi
  xorl    %eax, %eax              ;   xorl    %eax, %eax
  leaq    (%rdi,%rsi,4), %rdx     ;   leaq    (%rdi,%rsi,4), %rdx
.L_loop:                           ; .L_loop:
  addl    (%rdi), %eax            ;   addl    (%rdi), %eax
  addq    $4, %rdi                ;   addq    $4, %rdi
  cmpq    %rdx, %rdi              ;   cmpq    %rdx, %rdi
  jne     .L_loop                 ;   jne     .L_loop
  ret                             ;   ret

No ud2. No int3. No trap instruction anywhere in the checked version. GCC proves the branch is dead and emits nothing. Zero bytes of overhead.

Clang 21 reaches the same result through a different path — it vectorizes both functions with SIMD, and the vectorized versions are structurally identical too.

The runtime numbers confirm what the assembly shows. Over a 65,536-element integer array on an i7-4790 at 3.6 GHz, GCC 15 at -O2:

Variant	Median throughput	Items/sec
Raw pointer (no check)	11.4 µs	5.77 × 10⁹
Runtime checked	11.3 µs	5.80 × 10⁹

The difference is noise. CV < 2.3% for both variants.

This is why profiles are an engineering bet worth taking — not a white-paper idea. The check exists in source for the cases where the optimizer can’t prove safety — function boundaries, runtime-dependent indices, pointer provenance that crosses translation units. Inside a tight loop with a visible bound, the optimizer removes the check completely. It doesn’t minimize the cost. It proves the check unreachable and emits zero instructions.

What profiles are not

Profiles don’t change C++ semantics or break conforming code. They are a set of constraints — a subset of C++ that excludes certain constructs — enforced per translation unit. You opt in at the compiler flag level, similar to -fno-exceptions.

They’re also not a replacement for sanitizers. Sanitizers instrument everything and catch violations at runtime with full diagnostics. Profiles shift the strategy: diagnose what’s catchable statically, trap what must be caught at runtime, let the optimizer remove what it can prove. The production binary carries checks only for accesses where the compiler genuinely cannot prove correctness.

Where this actually stands

The WG21 discussion at using std::cpp 2026 focused on the boundary cases — the accesses where the optimizer doesn’t eliminate the check. Virtual dispatch. Indirect calls. Cross-TU pointer provenance. Nobody has comprehensive measurements for those cases yet. The loop-internal pattern I tested is the best case. Beyond that, everything is extrapolation.

Profiles are targeting a Technical Specification, not C++26 directly. Compiler vendors can implement and ship without waiting for the next standard cycle. GCC’s optimizer already handles the cooperative pattern I showed — proving loop-internal checks redundant — so the compiler infrastructure cost is incremental, not architectural.

If you’re already running sanitizers in CI, profiles offer something sanitizers can’t: carrying the checks into optimized production builds at zero cost where the optimizer proves them dead, and at bounded cost where it can’t. The status quo — shipping -O2 with no safety checks and hoping the test suite found everything — is how the CVE reports keep happening.

Committee politics are not resolved. But the assembly is. For the common case, the check is free.