#Performance

9 articles

When JSON Schema Crashes Your Inference Server: Regex DoS in C++

Feed `std::regex` a pathological pattern and a crafted input, and watch it spiral. Input length 16 characters: 4.48 milliseconds. Input length 18: 18 milliseconds. Input length 20: over a second. The

Modern C++ // dev May 8, 2026 9 min read

Compilers Performance Data Structures LLVM

LLVM's Flat-Buffer Tree for IR Dominators: O(1) Reads vs O(n) Moves

Compiler optimization passes live and die on tree traversal. LLVM's dominator analysis alone queries ancestor relationships thousands of times per function. A real C++ translation unit with heavy temp

Modern C++ // dev May 5, 2026 9 min read

C++26 Performance Safety Standards

Compile-Time Unsigned Overflow Detection in C++: From `if`-checks to `constexpr` Assertions

Unsigned overflow wraps. That's the contract. C++ won't catch it, the CPU won't trap it. But your size calculation that wraps to a smaller buffer? That's a memory corruption vulnerability, and it's yo

Modern C++ // dev May 1, 2026 9 min read

SIMD Performance Algorithms Benchmarks

Beyond Binary Search: When Interpolation Beats Quaternary, Radix, and SIMD

You have a sorted array. Need to search it. `std::lower_bound`: O(log n), predictable, robust. This is the default solution. But my industry experience breeds skepticism. Every few months, a new paper

Modern C++ // dev Apr 28, 2026 10 min read

C++26 Concurrency Performance Standards

Parallel execution for loops: C++26's work-stealing scheduler under the hood

I've spent enough time staring at `perf stat` output to recognize a pattern: OpenMP's dynamic scheduler measures **55,593 ops/sec** on Zipf-distributed task costs with 512 tasks. That's about 18 micro

Modern C++ // dev Apr 24, 2026 9 min read

Performance AI/ML SIMD Optimization

Kernel Fusion on CPU: What llama.cpp's RMS_NORM + MUL Fusion Teaches Us About LLM Performance

Llama.cpp's PR #22423 landed a kernel fusion for RMS_NORM + MUL in the ggml CPU backend a few weeks ago. The speedup: 1.60×. Consistently. Across dimension sizes, thread counts, even hardware variatio

Modern C++ // dev Apr 21, 2026 7 min read

C++26 Move Semantics Standards Performance

C++26 Move Semantics: What's New Since CppCon 2025 Basics Talk

If you watched Ben Saks's CppCon 2025 'Back to Basics: Move Semantics' talk, you know what moves are and why the compiler calls them. That talk is solid. C++26 doesn't contradict it. What it does is t

Modern C++ // dev Apr 10, 2026 8 min read

SIMD Performance x86 Optimization

Designing a SIMD Algorithm from Scratch

I manually unrolled a byte-counting loop with four independent accumulators — the textbook ILP optimization — and it ran 2.08x *slower* than the plain loop. The plain loop that GCC had quietly autovec

Modern C++ // dev Mar 31, 2026 10 min read

Performance PGO Compilers Optimization

Profile-Guided Optimization Made Our Code Slower

That's the whole story. I took a virtual-dispatch interpreter loop — the textbook PGO target — instrumented it, trained it on a representative workload, and recompiled. Both GCC 15.2.1 and Clang 21.1.

Modern C++ // dev Mar 10, 2026 8 min read