← Back to articles

C++23 std::stacktrace: Never Debug Blind Again

Modern C++ // dev April 21, 2026 9 min read

I have written the #ifdef tower — backtrace() on Linux, CaptureStackBackTrace() on Windows, dladdr for symbol resolution on one side, dbghelp.dll on the other — more times than I care to admit. Then you spend a day debugging your debug infrastructure, which is the kind of recursive misery that makes you question your career choices.

C++23 ships <stacktrace>. One header. Portable, demangled frames with source locations. And on GCC 15.2.1 with libstdc++, it works today. The catch is in the details, because this is C++ and the catch is always in the details.

What You Need

Feature-test first:

#include <stacktrace>
static_assert(__cpp_lib_stacktrace >= 202011L);

On GCC 15.2.1 / libstdc++, __cpp_lib_stacktrace is exactly 202011L. The types are small: std::stacktrace is 16 bytes — a pointer and a size, a std::vector-like handle to heap-allocated frame data. Each std::stacktrace_entry is 8 bytes, an opaque handle that resolves lazily when you call its accessors.

Compiler support as of April 2026: GCC 14+ works. You link with -lstdc++exp — that’s libstdc++exp.a on Fedora/RHEL, and older GCC docs mention -lstdc++_libbacktrace which no longer exists as of GCC 15 (I wasted twenty minutes on that before checking). MSVC has had <stacktrace> since VS 2022 17.4. Clang with libc++ is a different story — the header exists, but the types are stubs and nothing compiles against them.

The Entire API That Matters

#include <stacktrace>
#include <iostream>

void some_function() {
    auto trace = std::stacktrace::current();
    std::cout << trace << '\n';  // operator<< does the formatting
}

std::stacktrace::current() captures the call stack at the point of invocation. The returned object is a value type — owns its data, movable, copyable, safe to store. That’s it. Five lines.

What does the output look like in practice? I compiled a five-level call chain (main → alpha → beta → gamma → delta) with g++ -std=c++23 -O2 -g -lstdc++exp. The trace contained 9 frames: 5 user frames with demangled names, source files, and line numbers, plus 4 runtime frames (__libc_start_call_main, __libc_start_main, _start, and one empty frame).

[[gnu::noinline]]
void delta() {
    auto st = std::stacktrace::current();
    for (std::size_t i = 0; i < st.size(); ++i) {
        const auto& entry = st[i];
        std::cout << std::format("  [{}] {} — {}:{}\n",
                                 i,
                                 entry.description(),
                                 entry.source_file(),
                                 entry.source_line());
    }
}

Output (trimmed to user frames):

  [0] delta() — claim-01-basic-stacktrace.cpp:16
  [1] gamma() — claim-01-basic-stacktrace.cpp:31
  [2] beta()  — claim-01-basic-stacktrace.cpp:34
  [3] alpha() — claim-01-basic-stacktrace.cpp:37
  [4] main    — claim-01-basic-stacktrace.cpp:40

Names are demangled. Line numbers are 1-based and point to the call site, not the return address — the implementation adjusts for you. Paths are absolute (shortened here for sanity).

The -g Tax

Compile without -g and the output degrades: source_file() returns empty strings, source_line() returns 0 for every frame. Function names survive — they come from the ELF symbol table, not DWARF — but all file/line information is gone.

This is expected behavior. The libstdc++ backend resolves source locations from DWARF debug info. No -g, no DWARF, no locations.

The thing people get wrong: -g with -O2 does not change the generated code on GCC. It only adds DWARF sections. Same codegen, larger binary. If you strip before deployment, ship the debug info to a symbol server and use debuginfod to resolve traces after the fact. Or use -g1 for line tables only — minimal size overhead, enough for source_file() and source_line() to work.

The Three Accessors

Each std::stacktrace_entry exposes:

std::string      description()   // demangled function name (or address as string)
std::string      source_file()   // absolute path to source file (empty if no debug info)
std::uint_least32_t source_line() // 1-based line number (0 if unknown)

Note that description() returns a std::string — that’s a heap allocation per call. If you’re logging traces on a hot path (reconsider that decision), cache the result. The entries also support ==, <=>, and are hashable, so you can deduplicate traces in an std::unordered_set if you’re aggregating errors.

Controlling Capture Depth

The full signature:

static stacktrace current(size_type skip = 0, size_type max_depth = /*max*/);

skip drops frames from the top. To cap the total, use max_depth. I tested both with a 7-level call chain (main → f1 → ... → f6) on GCC 15.2.1 (i7-4790, Fedora 43):

  • current(0, 20) — 11 frames total (7 user + 4 runtime), frame[0] is f6()
  • current(1, 20) — 10 frames, frame[0] is f5()f6 correctly dropped
  • current(0, 3) — exactly 3 frames: f6, f5, f4
  • current(2, 3) — 3 frames starting at f4(): both parameters compose

skip is how you hide your implementation details. If your logging function calls a helper that calls current(), pass skip = 2 so callers see their own frames, not yours. max_depth is how you bound the cost — and the cost matters.

What It Costs

Stack trace capture involves unwinding (walking frame pointers or DWARF CFI), symbol resolution, demangling, and DWARF line-table lookup. I measured with Google Benchmark (5 repetitions, median values) on an i7-4790 at 3.6 GHz, GCC 15.2.1, -O2 -g:

Capture modeMedian timeRelative to full
No-op baseline0.5 ns
Full capture (all frames)2,613 ns1.0×
Limited to 5 frames1,492 ns0.57×
Limited to 1 frame757 ns0.29×

2.6 µs for a full capture. About 9,400 clock cycles. Not something you want in a tight loop, but for error paths? A server handling 10,000 requests/second with one trace per error at a 1% error rate spends 26 µs/second on stack traces. I wouldn’t even bother measuring that.

The cost scales with depth — cap at 5 frames and you’re at 1,492 ns (43% off the full price); cap at 1 frame and it’s 757 ns, a 71% reduction. If you’re using traces in a custom allocator or a frequently-hit assertion, max_depth is the knob you want.

Exceptions That Carry Their Own Traces

This is the pattern I actually use. By the time a catch block runs, the stack is unwound — the call chain that threw is gone. Capture at throw-time and the trace survives:

class traced_error : public std::runtime_error {
    std::stacktrace trace_;
public:
    explicit traced_error(std::string msg)
        : std::runtime_error(std::move(msg))
        , trace_(std::stacktrace::current(1))  // skip the constructor frame
    {}

    const std::stacktrace& trace() const noexcept { return trace_; }
};

current(1) skips the constructor itself, so frame[0] is the function that threw.

try {
    outer();  // eventually calls inner(), which throws traced_error
} catch (const traced_error& e) {
    std::cerr << e.what() << "\n" << e.trace() << "\n";
}

Output:

something went wrong in inner()

  [0] inner()  — claim-04-exception-stacktrace.cpp:26
  [1] outer()  — claim-04-exception-stacktrace.cpp:33
  [2] main     — claim-04-exception-stacktrace.cpp:39

The throw site. Right there. No debugger, no core dump, no guesswork.

One thing that bit me during testing: if the compiler inlines the traced_error constructor into inner(), then current(1) skips past inner() to its caller. I had to mark the constructor [[gnu::noinline]] to keep the skip count predictable. In a production codebase, you either add that attribute or accept that the skip count might be off by one in optimized builds and handle it at the log-analysis level. I’d add the attribute.

The Gotchas

Tail calls eat frames. With -O2, GCC can turn a call at the end of a function into a jump, removing that frame from the stack entirely. If your trace is missing expected frames, this is the usual suspect. You can prevent it with a side-effecting operation after the call (even a volatile write), but that’s a debugging workaround, not production code. Accept that optimized traces may be shorter than source-level reasoning suggests.

Link flags are wrong in old docs. On GCC/libstdc++, the backend lives in libstdc++exp.a, linked with -lstdc++exp. Documentation from older GCC versions says -lstdc++_libbacktrace. That library name doesn’t exist on GCC 15. If your link fails with unresolved stacktrace symbols, check the library name for your version first.

No async stitching. current() captures the synchronous call stack. Coroutines, thread pools, std::async — the trace stops at the executor boundary. The continuation’s logical caller is not on the physical stack. C++26 doesn’t fix this. You need manual correlation IDs or structured logging to bridge async boundaries, same as before.

Allocator awareness. std::stacktrace is actually std::basic_stacktrace<std::allocator<std::stacktrace_entry>>. You can supply your own allocator for signal handlers or custom memory arenas. In practice, I’ve never needed to.

Where It Fits

The traced_error pattern above is the killer application. Exception types in libraries, assertion macros that log before aborting, structured error logging where you want throw-site info without shipping core dumps. In debug builds, you get richer diagnostics without having to attach a debugger — and you get them retroactively from logs rather than having to reproduce the failure.

Where it doesn’t fit: hot loops, signal handlers (the default allocator is not async-signal-safe), anything that can’t afford ~2.6 µs per capture. For signal handlers, you’re still stuck with backtrace() and a pre-allocated buffer, or a raw frame-pointer walk.


The #ifdef tower I’ve been maintaining since 2009 can finally go. GCC and MSVC have full support. The cost is 2.6 µs and bounded by max_depth. The one thing keeping me from deleting all the platform-specific code today is Clang — once libc++ ships their implementation, the last excuse dies.