When JSON Schema Crashes Your Inference Server: Regex DoS in C++
Feed `std::regex` a pathological pattern and a crafted input, and watch it spiral. Input length 16 characters: 4.48 milliseconds. Input length 18: 18 milliseconds. Input length 20: over a second. The
Modern C++ // dev May 8, 2026 9 min read
LLVM's Flat-Buffer Tree for IR Dominators: O(1) Reads vs O(n) Moves
Compiler optimization passes live and die on tree traversal. LLVM's dominator analysis alone queries ancestor relationships thousands of times per function. A real C++ translation unit with heavy temp
Modern C++ // dev May 5, 2026 9 min read
Compile-Time Unsigned Overflow Detection in C++: From `if`-checks to `constexpr` Assertions
Unsigned overflow wraps. That's the contract. C++ won't catch it, the CPU won't trap it. But your size calculation that wraps to a smaller buffer? That's a memory corruption vulnerability, and it's yo
Modern C++ // dev May 1, 2026 9 min read
Beyond Binary Search: When Interpolation Beats Quaternary, Radix, and SIMD
You have a sorted array. Need to search it. `std::lower_bound`: O(log n), predictable, robust. This is the default solution. But my industry experience breeds skepticism. Every few months, a new paper
Modern C++ // dev Apr 28, 2026 10 min read
Parallel execution for loops: C++26's work-stealing scheduler under the hood
I've spent enough time staring at `perf stat` output to recognize a pattern: OpenMP's dynamic scheduler measures **55,593 ops/sec** on Zipf-distributed task costs with 512 tasks. That's about 18 micro
Modern C++ // dev Apr 24, 2026 9 min read
Kernel Fusion on CPU: What llama.cpp's RMS_NORM + MUL Fusion Teaches Us About LLM Performance
Llama.cpp's PR #22423 landed a kernel fusion for RMS_NORM + MUL in the ggml CPU backend a few weeks ago. The speedup: 1.60×. Consistently. Across dimension sizes, thread counts, even hardware variatio
Modern C++ // dev Apr 21, 2026 7 min read
P3373R2: The Case for a Standardized Low-Latency I/O API
Here's the uncomfortable truth: modern C++ standard library I/O becomes a bottleneck at scale. Traditional POSIX APIs introduce 1–10 microseconds of latency per operation due to syscall overhead and k
Modern C++ // dev Apr 17, 2026 10 min read
C++26 Move Semantics: What's New Since CppCon 2025 Basics Talk
If you watched Ben Saks's CppCon 2025 'Back to Basics: Move Semantics' talk, you know what moves are and why the compiler calls them. That talk is solid. C++26 doesn't contradict it. What it does is t
Modern C++ // dev Apr 10, 2026 8 min read
Designing a SIMD Algorithm from Scratch
I manually unrolled a byte-counting loop with four independent accumulators — the textbook ILP optimization — and it ran 2.08x *slower* than the plain loop. The plain loop that GCC had quietly autovec
Modern C++ // dev Mar 31, 2026 10 min read
SIMD-accelerated computer vision on a $2 microcontroller
The RP2350 has a feature most embedded developers ignore. Two Cortex-M33 cores at 150 MHz, 520 KB of SRAM, $0.80 in quantity — and buried in the ISA, packed arithmetic instructions that process four 8
Modern C++ // dev Mar 24, 2026 13 min read
Profile-Guided Optimization Made Our Code Slower
That's the whole story. I took a virtual-dispatch interpreter loop — the textbook PGO target — instrumented it, trained it on a representative workload, and recompiled. Both GCC 15.2.1 and Clang 21.1.
Modern C++ // dev Mar 10, 2026 8 min read
Lock-Free Queue Implementations Compared: Correctness, Performance, and the Bugs You'll Ship
A `std::mutex`-protected `std::deque` is 12% faster than moodycamel::ConcurrentQueue when contention is low.
Modern C++ // dev Mar 6, 2026 12 min read
Cache-Line Archaeology: Finding and Fixing False Sharing in Production
Your threads are doing independent work on independent data, and yet adding a second thread makes everything six times slower. This is false sharing, and it hides in struct layouts and thread-local counters across more production codebases than anyone wants to admit.
Modern C++ // dev Feb 27, 2026 8 min read