Status: implemented. Audience: compiler/runtime contributors.
Drop/lifetime bugs in compiled With programs — double-free, use-after-free, leak — were being diagnosed by characterizing from black-box run-counts: run a repro, read the exit count, guess the mechanism. On the #606 inline-drop-field leak that produced four contradictory characterizations before the truth settled. The cure is an instrument that narrates the allocator's own behavior at the instruction level: which buffer was allocated where, freed where, freed again where, or never freed.
The With runtime is a custom slab allocator: it rt_mmaps anonymous pages once and
hands out size-classed sub-blocks from inside them (rt_alloc / free_small_block in
rt/rt_core.w). It never calls libc malloc/free.
ASan derives its power from interposing libc malloc/free and redzoning each allocation
in shadow memory. To ASan, our whole mmap region is one opaque blob — it cannot see the
thousands of logical sub-allocations inside it, so a double-free of a sub-block is invisible
and a use-after-free reads "valid mapped memory." This was confirmed by linking a known
double-free with -fsanitize=address: no report.
This is not a flag we forgot — it is architectural, and it collides with a hard project constraint:
With is 100% self-hosted: zero C in the compiler. With + inline asm + syscalls only. This is not a debug-build exception.
That constraint settles the design space permanently:
- Route allocations through libc
malloc/freeso ASan interposes them — rejected: a permanent C allocator dependency. - Valgrind — rejected: external C tool (and unsupported on ARM64, our primary platform).
- Go-style ASan annotation of the custom slab (
__asan_poison_memory_region/__asan_unpoison_memory_region+ redzones) — this is the standard technique for making a custom allocator visible to ASan, and Go does exactly it for its mspan slab. Rejected as specced, because__asan_*are calls into ASan's C runtime.
What the reference languages do, for the record (.reference/): Rust ships no custom
allocator (libc System alloc → ASan "just works") plus Miri, a separate interpreter.
Go annotates its custom slab for ASan (the C-calling path we forbid). Zig ships a
native DebugAllocator (lib/std/heap/debug_allocator.zig) — pure Zig, cross-platform, no
C — whose feature list is essentially this document: stack traces on alloc and free,
double-free reporting all three traces (alloc, first free, second free), leak detection.
The only option that is constitutive of the no-C goal rather than in tension with it is
a native debug allocator written entirely in .w (plus inline asm / syscalls, exactly
as the runtime already does). It is the same discipline as the runtime migration: replace a
C capability with a With one we own. We adopt the Zig model in .w.
If sanitizer-ecosystem interop is pursued later, it is permitted only by no-C means:
emit ASan's shadow-memory format with direct memory writes / inline asm (the
Zig-Valgrind-client-request pattern — talk to the tool via its in-memory ABI, not its C
API — adapted to ASan's shadow). The C-calling form (__asan_poison, linking ASan's
runtime) is permanently out. "ASan annotation later" survives only in its no-C form.
Do not let a future contributor read "no C" as "ASan is impossible" — it is possible, just
only the hard way.
Because we own codegen too, the ledger carries the MIR origin of each emitted Drop, so a double-free abort names which drop freed the block first and which drop freed it again -- collapsing a multi-day characterization into one line. No external sanitizer can do this.
All additive to rt/rt_core.w; pure .w.
Why no in-process backtraces. The original design captured alloc/free backtraces by walking the frame-pointer chain (read
x29via inline asm, follow*(fp)/*(fp+8)). Verified by running, this does not work in the current With codegen:x29does not point to a standard aarch64 frame record (reading[fp]/[fp+8]yields 0 even inside a framed,@[noinline]function). With's codegen does not maintain a walkable fp chain at the opt levelswith runuses. Rather than chase a codegen change (out of scope), the first cut split the work: the ledger does detection in-process (cheap, robust, pure.w), and lldb resolves source sites out-of-process, conditioned on the address the ledger reports — lldb uses real DWARF/compact-unwind unwinding that actually works. Compiler-emitted Drop sites now also carry MIR-origin tags directly into the free path, so double-free reports name both drops without requiring unwinding.
- Ledger. A side table backed by a direct
rt_mmapregion (never recursing through the instrumented allocator), an open-addressing hash keyed by payload address:{addr, size, freed_flag, alloc_origin, first_drop_ptr, first_drop_len, root_flag, root_reason}. Guarded by the existing allocator lock. The gate is read once via the non-allocatingrt_getenv(with_getenv_strwould deadlock the non-reentrant allocator lock) and cached. - Instrumented alloc/free.
rt_allocrecords (or, on address reuse, resets) an entry. Tagged front doors such aswith_alloc, Vec buffer growth, channel allocation, and fiber record allocation store a coarse allocation-origin token.rt_freelooks up the entry before the existing ownership check: if already freed -> double-free, printdebug-alloc: DOUBLE FREE addr=<a> size=<n> origin=<site> first_drop=<tag> second_drop=<tag>and abort (exit 134); else mark freed and remember the first drop tag. This sees freelist double-pushes the existingrt_payload_start_can_be_ownedpanic can miss. - Scribble on free (opt-in:
WITH_DEBUG_ALLOC_SCRIBBLE). Freed small payloads are overwritten with0xDEso use-after-free reads corrupt loudly. It is off by default because, for aVec[Drop]buffer, poisoning the freed payload turns a subsequent double-drop's element read into a use-after-free crash before the ledger reports the buffer's double-free — masking the clean verdict. Enable it to hunt use-after-free specifically. The freelist link lives in the header word (payload-16), untouched. (Not "never-reuse" — that would break the slab; a never-reuse UAF mode is a later refinement.) - Leak at exit.
with_runtime_shutdown(on the nativewith run/buildexit path) printsdebug-alloc: LEAK addr=<a> size=<n> origin=<site>for every still-live entry, then aleak count=<k>. Runtime code can callwith_debug_alloc_mark_root(ptr, reason_ptr, reason_len)to label an intentional process-lifetime root.WITH_DEBUG_ALLOC_FILTER=all|non-root|rootscontrols whether all leaks, only non-root leaks, or only root leaks are printed. A field-drop that never fires shows up as a live non-root entry (the slab's freelist recycle is recorded as a free, so it is not a false leak). - Site resolution (harness). The in-process report names the coarse origin token
directly, and double-free reports name first/second compiler Drop tags when the free came
through generated drop code. When an exact source line is needed, the driver can still run
lldb conditioned on the address (break on
rt_allocreturning it / onrt_free/with_vec_freetaking it,btat each) to name precise alloc and free call sites.
The mechanism is a single cached bool — runtime-gated, not a build variant (a build
flag → comptime-define → conditional-compile path is bootstrap/fixpoint risk for no benefit
that matters here; the cost it avoids is one cached branch and a few KB of inert .w).
Runtime gating also means the instrument works on any existing binary, including release,
with no rebuild — the right shape for a tool whose purpose is to end the
rebuild-and-characterize loop.
Two front doors set the same cached bool:
--debug-alloc— a first-class, documented CLI flag (shows in--help). It does not thread a comptime define; onwith runit simply setsWITH_DEBUG_ALLOC=1in the child's environment before exec, which the child's runtime reads.WITH_DEBUG_ALLOC— the env var, for the harness driver and for toggling an already-built binary without re-launching.
Leak filtering is controlled separately by --debug-alloc-filter=<mode> or
WITH_DEBUG_ALLOC_FILTER, where mode is all, non-root, or roots. The default is
all. Use non-root when process-lifetime roots would otherwise hide a real leak.
The gate is read once and cached (never getenv per allocation — that would be a
hot-path regression fixpoint cannot catch). When off, the cost is one cached-bool branch in
the alloc/free path; the dormant instrumentation is byte-identical at build time (the build
never sets the env), so fixpoint is unaffected.
General principle this establishes: diagnostic capabilities get a discoverable CLI flag
even when runtime-gated — mechanism (runtime switch) and discoverability (a --help flag)
are independent axes.
tools/debug_drop.w (pure .w, no shell script): runs a repro or fixture corpus under the
debug allocator and parses the ledger/abort/leak output into a verdict. The
:debug-alloc-tests target builds it to out/debug-alloc-tests/debug_drop and runs it in
check mode over the committed corpus. For one-off repros:
./out/release/bin/with build tools/debug_drop.w -o out/debug-alloc-tests/debug_drop
out/debug-alloc-tests/debug_drop run ./out/release/bin/with repro.wSource sites are resolved separately with lldb command files (the lldb on the dev box has
no script interpreter, so no Python): tools/debug_drop_sites.lldb for alloc/free sites and
tools/debug_drop_fields.lldb when the allocator verdict points at a drop/codegen bug.
test/debug_alloc/ is also the regression gate for #607. Its inline-drop field
fixtures must all report leak count=0 and never DOUBLE FREE, including the
field-receiver push-tail and field-chaining cases that the ordinary floor cannot
see. da_manual_double_free intentionally remains a DOUBLE FREE fixture, and
da_drop_origin_double_free intentionally checks that generated drops report
first_drop=/second_drop= tags. da_root_filter marks a live allocation as a root and
runs under //! debug-alloc-filter: non-root, proving root leaks can be suppressed without
suppressing ordinary leaks. da_pod_vec intentionally remains leak count=1 for #608.
- Leak-report noise. The runtime intentionally never-frees some allocations (interned
strings, arg buffers). Marked roots plus
--debug-alloc-filter=non-rootsuppress known process-lifetime roots, but unmarked roots still appear in the raw leak list until they are classified. - Exact source sites still need a second (lldb) pass. The ledger names the block (address + size), the verdict (double-free / leak), the allocation-origin token, and generated Drop tags in-process; exact source lines still come from the harness's lldb pass conditioned on that address. In-process backtraces are not used (see the note above).
- Abnormal exit. Leak-at-exit fires on normal termination via
with_runtime_shutdown; exits viart_exit/panic skip the report.
- A user-frame filter for clean leak attribution.
- No-C ASan-shadow emission for sanitizer-ecosystem interop (the hard-way path above).
- Never-reuse-address UAF mode.