Problem
mr store gc only removes unused worktrees (and skips in-use/dirty ones). It
never reclaims the dominant disk consumer: regenerable per-worktree artifacts
(target/, node_modules/pnpm store entries, tmp/ except tmp/worklog)
inside named-branch worktrees that are not currently in use. Those artifacts
can grow without bound and have caused disk-full incidents on build hosts.
Worktree-gc protects every refs/heads/* worktree unconditionally, so it cannot
help here. We need to prune just the regenerable artifacts of a worktree that is
cold — not referenced by any active workspace and not touched recently —
without ever removing source, .git, tmp/worklog, or anything of an in-use ref.
A manual stopgap (scan + prune of artifact dirs in cold refs) relieved the
incidents but re-derives "what is a worktree / is it in use" outside mr. mr owns
worktree identity and liveness, so the mechanism belongs here.
Design
Extend mr store gc with an orthogonal mode (reusing the existing liveness
registry, worktree walk, per-worktree lock, dry-run, and JSON output):
mr store gc --prune-artifacts [--dry-run] [--json]
[--artifact-class tmp|node_modules|target|all] # default all
[--max-age-tmp 3d --max-age-node-modules 7d --max-age-target 14d]
[--min-free <bytes>] # stop once free space clears the budget
Cold definition (artifacts reclaimable when ALL hold)
- Not in-use — not in any workspace registry
livePaths set and not the
current workspace root (reuse collectStoreLiveSet). Hard correctness gate.
Unlike worktree-gc, this applies to named-branch worktrees too.
- Stale — freshness exceeds the per-class window. Freshness = most-recent of
worktree dir mtime and HEAD commit date (any recent touch keeps it hot). mtime
alone is unreliable under hardlinked stores / git-rewrite, so commit-date is the
primary signal; an explicit per-workspace "last active" touch is a later option.
- Dirty/unpushed source does NOT block artifact pruning (artifacts are
regenerable; source is untouched) — but is surfaced in dry-run output.
Per-artifact restore-cost tiers (defaults in mr, overridable by host config)
| Class |
Paths |
Restore cost |
Default window |
| tmp |
tmp/ (≠ tmp/worklog) |
~zero |
3d |
| node_modules |
node_modules/ only |
low once a shared hardlink store exists |
7d |
| target |
target/ |
rebuild (cheap-not-free with a compile cache) |
14d |
Never touched: source, .git, tmp/worklog, any in-use ref. False-cold costs a
rebuild, never data. --prune-artifacts removes only per-worktree node_modules;
the shared content-addressed pnpm store is left to pnpm's own store prune.
Safety
Re-check the in-use gate under the existing per-worktree lock before each delete
(same TOCTOU protection as worktree-gc). Emit scanned/removed/skipped/reclaimed
totals + per-worktree reason (skipped_in_use / skipped_hot), labelled by class.
Consumer contract
Stable JSON output so an external disk-hygiene timer can call this pressure-aware
(--min-free=<soft floor> → prune coldest/cheapest-first until above the floor)
and periodically (age-gated only), and emit metrics without re-deriving liveness.
mr owns "cold" and "what's an artifact"; the host owns floors, schedule, and
metric export, passing the floor only as the --min-free budget.
Docs
Refine packages/@overeng/megarepo/docs/spec.md first: extend the mr store gc
section with the --prune-artifacts mode, cold definition, artifact-class tiers,
and JSON consumer contract; add an invariant (artifact reclamation never removes
source, .git, tmp/worklog, or artifacts of an in-use worktree; cold =
not-in-use AND stale; false-cold costs a rebuild, never data); add a Core Concept
distinguishing cold worktree (artifact-reclaimable) from named-ref protection
(worktree-deletion protection).
Tasks
Acceptance criteria
- A named-branch worktree not referenced by any active workspace and untouched
past its class window has only its artifact dirs removed; source/.git/
tmp/worklog remain intact.
- An in-use or recently-touched worktree is always skipped.
--dry-run --json reports candidate paths, classes, and reclaimable bytes with zero deletions.
--min-free prunes coldest/cheapest-first and stops once the budget is met.
Problem
mr store gconly removes unused worktrees (and skips in-use/dirty ones). Itnever reclaims the dominant disk consumer: regenerable per-worktree artifacts
(
target/,node_modules/pnpm store entries,tmp/excepttmp/worklog)inside named-branch worktrees that are not currently in use. Those artifacts
can grow without bound and have caused disk-full incidents on build hosts.
Worktree-gc protects every
refs/heads/*worktree unconditionally, so it cannothelp here. We need to prune just the regenerable artifacts of a worktree that is
cold — not referenced by any active workspace and not touched recently —
without ever removing source,
.git,tmp/worklog, or anything of an in-use ref.A manual stopgap (scan + prune of artifact dirs in cold refs) relieved the
incidents but re-derives "what is a worktree / is it in use" outside mr. mr owns
worktree identity and liveness, so the mechanism belongs here.
Design
Extend
mr store gcwith an orthogonal mode (reusing the existing livenessregistry, worktree walk, per-worktree lock, dry-run, and JSON output):
Cold definition (artifacts reclaimable when ALL hold)
livePathsset and not thecurrent workspace root (reuse
collectStoreLiveSet). Hard correctness gate.Unlike worktree-gc, this applies to named-branch worktrees too.
worktree dir mtime and HEAD commit date (any recent touch keeps it hot). mtime
alone is unreliable under hardlinked stores / git-rewrite, so commit-date is the
primary signal; an explicit per-workspace "last active" touch is a later option.
regenerable; source is untouched) — but is surfaced in dry-run output.
Per-artifact restore-cost tiers (defaults in mr, overridable by host config)
tmp/(≠tmp/worklog)node_modules/onlytarget/Never touched: source,
.git,tmp/worklog, any in-use ref. False-cold costs arebuild, never data.
--prune-artifactsremoves only per-worktreenode_modules;the shared content-addressed pnpm store is left to pnpm's own
store prune.Safety
Re-check the in-use gate under the existing per-worktree lock before each delete
(same TOCTOU protection as worktree-gc). Emit scanned/removed/skipped/reclaimed
totals + per-worktree reason (
skipped_in_use/skipped_hot), labelled by class.Consumer contract
Stable JSON output so an external disk-hygiene timer can call this pressure-aware
(
--min-free=<soft floor>→ prune coldest/cheapest-first until above the floor)and periodically (age-gated only), and emit metrics without re-deriving liveness.
mr owns "cold" and "what's an artifact"; the host owns floors, schedule, and
metric export, passing the floor only as the
--min-freebudget.Docs
Refine
packages/@overeng/megarepo/docs/spec.mdfirst: extend themr store gcsection with the
--prune-artifactsmode, cold definition, artifact-class tiers,and JSON consumer contract; add an invariant (artifact reclamation never removes
source,
.git,tmp/worklog, or artifacts of an in-use worktree; cold =not-in-use AND stale; false-cold costs a rebuild, never data); add a Core Concept
distinguishing cold worktree (artifact-reclaimable) from named-ref protection
(worktree-deletion protection).
Tasks
docs/spec.md:--prune-artifactsmode, cold definition, tiers, invariant, concept--prune-artifactsinstoreGcCommand: in-use gate (reusecollectStoreLiveSet) + freshness gate + per-class artifact removal underwithWorktreeLock--artifact-class, per-class--max-age-*, and--min-freebudget flagstmp/worklog/source/.gitnever touched; dry-run lists without deleting;--min-freestops at budgetAcceptance criteria
past its class window has only its artifact dirs removed; source/
.git/tmp/worklogremain intact.--dry-run --jsonreports candidate paths, classes, and reclaimable bytes with zero deletions.--min-freeprunes coldest/cheapest-first and stops once the budget is met.