When JSON Schema Crashes Your Inference Server: Regex DoS in C++

Feed std::regex a pathological pattern and a crafted input, and watch it spiral. Input length 16 characters: 4.48 milliseconds. Input length 18: 18 milliseconds. Input length 20: over a second. The pattern is simple: (a+)+b against a string of ‘a’s ending in ‘X’. No match exists. The regex engine explores every possible partition of the ‘a’s across the nested quantifiers, backtracks, and tries again. Exponentially many times.

In an inference server where latency is the SLA—measured in single-digit milliseconds—a schema that takes a second to validate isn’t a performance problem. It’s a denial of service. And the attacker doesn’t need to exploit a bug in your code. They need only craft a JSON schema, hand it to a user, and let the user submit it to your server.

This happened to llama.cpp. The framework’s GBNF parser accepted JSON schemas and compiled them to regex patterns on the fly, during inference. A malicious schema could hang the server, exhaust the stack, or read out of bounds. The vulnerability report (PR #22606) involved bounds checking failures and unchecked recursion depth. But the root problem wasn’t any single bug—it was accepting std::regex as a safe building block when validating untrusted input. The standard library’s regex engine guarantees nothing about performance. It’s flexible and feature-complete. It’s also a liability.

Why Structured Outputs Need Untrusted Schema Validation

Structured outputs are now the norm. OpenAI’s function calling, Claude’s structured output constraints, and llama.cpp’s GBNF system let clients specify JSON schemas that constrain the model’s output token by token. It works: the model generates only valid JSON, with the right types, the required fields, no hallucinations in a field that should be a number.

The problem is that schemas come from users. If a user can supply a schema, they can supply a malicious one. And if the server validates that schema using std::regex—which is what llama.cpp did—the attacker wins.

What Actually Broke: The llama.cpp Vulnerability

In PR #22606, the llama.cpp team patched their GBNF regex parser. The vulnerability wasn’t a bug in std::regex itself. It was the absence of boundaries around it.

GBNF compiles a JSON schema to a regex pattern and matches it at inference time, token by token. If the pattern is pathological—nested quantifiers, overlapping alternations, deep recursion—the regex engine consumes exponential time, exhausts the stack, or reads out of bounds. The fix was straightforward: bounds checks on buffer access, recursion limits during pattern parsing, and validation of pattern structure before compilation.

This highlights the core problem. std::regex is a general-purpose regex library. It’s flexible, it supports backreferences and lookahead, and it makes no performance guarantees. The standard explicitly says you shouldn’t rely on the engine’s speed. Exposing it to untrusted input without explicit safeguards is the mistake—not specific to llama.cpp, but to any system that accepts schemas from users.

The Mechanics: Why Backtracking Blows Up

std::regex uses backtracking. When a branch fails, the engine rewinds and tries the next alternative. On pathological patterns, there are exponentially many branches to explore.

The pattern (a+)+b on input aaaa...aaX is the canonical example. The outer ()+ can consume 1 ‘a’, or 2, or 3, etc. For each choice, the inner + fills the rest. For a 20-character input, there are 2^20 possible partitions. The engine tries them all, finds no match, and gives up. Alone on a single input, it’s a curiosity. In production, it’s a vector.

Benchmarks on GCC 15.2.1, -O2, i7-4790:

10 characters: 73.7 microseconds
12 characters: 281.5 microseconds (3.8×)
14 characters: 1.11 milliseconds (3.9×)
16 characters: 4.48 milliseconds (4.0×)
18 characters: 18 milliseconds
20 characters: 73+ milliseconds (and rising)

Each pair of additional characters roughly quadruples the time. That’s exponential decay. For inference at microsecond-scale SLAs, 4 milliseconds per validation on a 16-character payload already eats your latency budget. Push to 20 characters and you’re over a second.

Stack as a Second Attack Vector

Deep recursion in pattern parsing is a separate vulnerability. Patterns with hundreds of levels of alternation—like (a|b|c|...|z) repeated 200 times—force the regex parser to recurse on the pattern structure itself. In standard deployments with 8 MB stacks, modern std::regex survives. In containers with 2 MB stacks (or hardened environments with even less), deep recursion becomes dangerous. Test with roughly 200 alternation levels on GCC 15.2.1 compiled and matched without crashing, indicating some safeguards exist. The vulnerability surface is there: in a container where stack is constrained, a pattern that parses cleanly on one system might crash on another.

Out-of-Bounds Access as the Third Vector

And then there are implementation bugs. If bounds checking in the regex parser is inconsistent—buffer assumed to be 300 bytes in one code path but checked as 256 in another—a malformed pattern triggers a read or write out of bounds. This was the llama.cpp issue: inconsistent buffer size assumptions in the GBNF parser, exposed by crafted input.

Defense: Replace the Engine

re2 is the solution. Google’s library compiles regex patterns to deterministic finite automata (DFAs) instead of using backtracking. No backtracking, no exponential blowup. Matching time is O(n), linear in the input length.

The same pathological pattern on re2:

10 characters: 103.0 nanoseconds
12 characters: 105.1 nanoseconds (1.02×)
14 characters: 120.5 nanoseconds (1.15×)
16 characters: 110.7 nanoseconds (1.07×)
20 characters: 119.2 nanoseconds

Flat. No growth.

On normal patterns? re2 is faster:

Pattern	std::regex	re2	Ratio
Email	698 ns	140 ns	5×
UUID	324 ns	159 ns	2×
Date	121 ns	59 ns	2×

The DFA construction cost amortizes at compile time, not per-match. For inference servers compiling a schema once and using it for thousands of requests, re2 is strictly better: safer and faster.

re2 is packaged (re2-devel on Fedora, libre2-dev on Debian), the API is close to std::regex, and migration takes hours. Drop the header, link -lre2, replace std::regex with re2::RE2. Done.

re2 doesn’t support backreferences, lookahead, or lookbehind. For JSON schema validation, you don’t need them. Character classes, quantifiers, alternation, anchors: re2 does all of that, safely.

Defense: Constrain the Stack

If you’re stuck with std::regex (for now), constrain the stack. Use setrlimit(RLIMIT_STACK, ...) to reduce from 8 MB to 4 MB or 2 MB. Deep recursion in the regex parser hits the limit and crashes quickly instead of hanging.

struct rlimit rl;
rl.rlim_cur = 4 * 1024 * 1024;  // 4 MB stack
rl.rlim_max = 4 * 1024 * 1024;
setrlimit(RLIMIT_STACK, &rl);

This is blunt. Legitimate patterns that recurse deeply might also fail. It’s a band-aid, not a fix. But it beats a hanging server.

Defense: Validate Before Compiling

Before compiling any regex from untrusted input, validate it:

Reject nested quantifiers: (a+)+, (a*)*
Reject deep alternation: cap nesting to 3 levels
Cap pattern length: reject anything over 1000 characters
Precompile a whitelist of safe schemas if you can

This is what llama.cpp added after the vulnerability. Parse the regex as a tree, walk it, flag any quantifier with another quantifier as a child. Track alternation depth. These heuristics catch 90% of the easy attacks. They won’t catch every possible ReDoS (the problem is theoretically hard), but they catch the weaponizable ones.

The Full Benchmark Suite

Testing four attack patterns against std::regex and re2 on a 20-character input:

Pattern	std::regex	re2
`(a+)+b`	73.6 ms	119 ns
`(a\|a)+b`	exponential	~linear
`(a\|ab)+b`	exponential	~linear
`(.)b`	catastrophic	~linear

This highlights re2’s superiority. std::regex has no exponential worst cases by accident—re2 has none by design. DFAs guarantee O(n) matching, period. For inference servers compiling once and matching thousands of times, this amortized cost is invisible and the win is total.

What Other Languages Did

Rust’s regex crate chose safety. It forbids backreferences, lookahead, and lookbehind, but guarantees O(n) matching with DFAs. Most Rust developers never miss the missing features. Go did the same: DFA-based matching, linear-time guarantee.

Python’s jsonschema validates the schema against a meta-schema first, catching most pathological patterns before they’re compiled. It’s defensive but not bulletproof. A sufficiently clever pattern can still slip through.

C++ has no equivalent in the standard library. You get std::regex: flexible, powerful, and unsafe. The choice is explicit: safety or convenience. Until the standard adds a safe linear-time option, re2 remains the go-to solution.

Reproducing the Attack

Every number in this article comes from real benchmark runs, not synthesis. GCC 15.2.1, clang 21.1.8, i7-4790, GCC flag -O2.

The test is trivial:

std::string pattern = "(a+)+b";
std::string input(20, 'a');
input += 'X';  // No match — triggers full backtracking

std::regex re(pattern);
bool result = std::regex_search(input, match, re);

With std::regex, it hangs for seconds. With re2, it returns instantly.

Defenders can reproduce this by building the investigation artifacts. The containerized setup ensures compiler versions, flags, and hardware are consistent.

The Bottom Line

If you’re building an inference server in C++:

Use re2. It’s faster than std::regex on normal patterns, and linear on pathological ones. DFA construction is cheap when amortized. Migration takes hours, not weeks. There’s no reason to use std::regex for schema validation.

If you must stick with std::regex (legacy code, etc.):

Validate schemas before compilation: reject nested quantifiers, cap alternation depth to 3, cap pattern length.
Monitor compile time: anything over 100 ms is suspicious.
Constrain the stack: setrlimit(RLIMIT_STACK) to 4 MB or 2 MB.
These are band-aids. Plan to migrate to re2.

The llama.cpp vulnerability was preventable. The fixes weren’t subtle—bounds checks, pattern validation, recursion limits. But they required explicit attention to the regex engine’s behavior. In a world where JSON schemas are now first-class features in LLM APIs, that attention is mandatory. A single malicious schema can destroy service availability. The decision to use std::regex for untrusted input is the mistake. Everything else follows.