You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Every tool that scans repos repeats the same five-step shape in its scan.rs:
Discover repos
(Optionally) parallel-fetch
Dispatch a per-repo closure via parallel_classify
The closure calls a tool-specific classify_* and assembles a tool-specific result struct
Aggregate counts and warnings into a ScanResult
The result is one shape, seven copies. The supposedly shared git-tidy-core/src/scan.rs is only 41 lines — it holds rayon glue, not the pipeline. The orchestration of discovery → fetch → parallelism → progress → aggregation has leaked into each tool.
Evidence
Tool scan.rs
Lines
git-repo-tidy
579
git-branch-tidy
563
git-lfs-tidy
450
git-stash-tidy
437
git-tag-tidy
386
git-worktree-tidy
336
git-remote-tidy
311
Shared git-tidy-core/src/scan.rs
41
The real per-tool variation is the classifier: classify_branch, classify_tag, classify_remote_branch, etc. Everything else — fetch sequencing, progress bars, parallelism, error collection — is bookkeeping that doesn't change between tools.
Desired end state
A deep ScanPipeline module in git-tidy-core owns discover, fetch, parallel dispatch, progress reporting, and result aggregation. Each tool plugs in a narrow Classifier (or equivalent) port that returns a tool-specific classified item. Per-tool scan.rs files shrink to the classifier impl and a thin call into the pipeline.
The interface across the seam becomes "give me a way to classify one repo's items, I'll give you the aggregated ScanResult<T>."
Where
crates/git-tidy-core/src/scan.rs — currently a 41-line rayon glue layer; expand into the deep pipeline
crates/git-tidy-core/src/{fetch.rs, discovery.rs, progress.rs} — likely consumed internally by the new pipeline
crates/git-tidy-core/src/classification.rs — defines Classification + classify_branch + classify_worktree; the port lives near these
crates/git-{branch,tag,stash,remote,repo,lfs,worktree}-tidy/src/scan.rs — each collapses to a classifier impl + call into core
crates/git-config-tidy is the exception (lint-not-scan) and stays as-is
Constraints
--match substring filter (worktree-tidy) and discovery override flags must remain wired through the pipeline's options.
Fetch ordering: worktree-tidy and branch-tidy currently fetch before classification (via parallel_fetch); the pipeline must let tools opt in to fetch.
Landed-detection caching is currently per-repo, per-branch — keep that locality. The cache lives behind the pipeline, not above it.
Progress bar behaviour (terminal vs JSON vs porcelain) must not regress.
All existing integration tests must pass without behaviour change.
Suggested approach
Identify the union of options across the seven scanners — fetch flag, behind threshold, age threshold, offline flag, repo filter — and design a ScanOptions once.
Design Classifier (or ScanWorker) around: input = repo + options; output = Vec<TidyItem> + per-repo warnings (where TidyItem is the trait introduced in Implement git-worktree-tidy CLI #1).
Land the pipeline behind tests in core.
Migrate tools one at a time, deleting per-tool orchestration as each lands.
Update CLAUDE.md "Core patterns" to describe the ScanPipeline seam.
Acceptance criteria
One ScanPipeline implementation in git-tidy-core runs discover → fetch → parallelise → classify → aggregate for every scan-shaped tool.
Each tool's scan.rs is ≤ ~100 lines (classifier impl + thin pipeline call).
All existing integration tests pass; no snapshot diff beyond test-suite consolidation.
Total per-tool scan.rs LOC drops by ≥ 60%.
git-tidy-core/src/scan.rs is no longer "just rayon glue" — it carries the pipeline.
This issue is a deepening, not a rewrite. The pipeline is where complexity concentrates. The narrow classifier port is the interface across which the seven existing classifiers already differ.
The deletion test: if you delete the seven scan.rs files, the orchestration must reappear somewhere — in one place, in core, behind one interface. The leverage is high: bug fixes to fetch sequencing or progress reporting land once and pay back across seven tools.
Problem
Every tool that scans repos repeats the same five-step shape in its
scan.rs:parallel_classifyclassify_*and assembles a tool-specific result structScanResultThe result is one shape, seven copies. The supposedly shared
git-tidy-core/src/scan.rsis only 41 lines — it holds rayon glue, not the pipeline. The orchestration of discovery → fetch → parallelism → progress → aggregation has leaked into each tool.Evidence
scan.rsgit-repo-tidygit-branch-tidygit-lfs-tidygit-stash-tidygit-tag-tidygit-worktree-tidygit-remote-tidygit-tidy-core/src/scan.rsThe real per-tool variation is the classifier:
classify_branch,classify_tag,classify_remote_branch, etc. Everything else — fetch sequencing, progress bars, parallelism, error collection — is bookkeeping that doesn't change between tools.Desired end state
A deep
ScanPipelinemodule ingit-tidy-coreowns discover, fetch, parallel dispatch, progress reporting, and result aggregation. Each tool plugs in a narrowClassifier(or equivalent) port that returns a tool-specific classified item. Per-toolscan.rsfiles shrink to the classifier impl and a thin call into the pipeline.The interface across the seam becomes "give me a way to classify one repo's items, I'll give you the aggregated
ScanResult<T>."Where
crates/git-tidy-core/src/scan.rs— currently a 41-line rayon glue layer; expand into the deep pipelinecrates/git-tidy-core/src/{fetch.rs, discovery.rs, progress.rs}— likely consumed internally by the new pipelinecrates/git-tidy-core/src/classification.rs— definesClassification+classify_branch+classify_worktree; the port lives near thesecrates/git-{branch,tag,stash,remote,repo,lfs,worktree}-tidy/src/scan.rs— each collapses to a classifier impl + call into corecrates/git-config-tidyis the exception (lint-not-scan) and stays as-isConstraints
--matchsubstring filter (worktree-tidy) and discovery override flags must remain wired through the pipeline's options.parallel_fetch); the pipeline must let tools opt in to fetch.Suggested approach
ScanOptionsonce.Classifier(orScanWorker) around: input = repo + options; output =Vec<TidyItem>+ per-repo warnings (whereTidyItemis the trait introduced in Implement git-worktree-tidy CLI #1).CLAUDE.md"Core patterns" to describe theScanPipelineseam.Acceptance criteria
ScanPipelineimplementation ingit-tidy-coreruns discover → fetch → parallelise → classify → aggregate for every scan-shaped tool.scan.rsis ≤ ~100 lines (classifier impl + thin pipeline call).scan.rsLOC drops by ≥ 60%.git-tidy-core/src/scan.rsis no longer "just rayon glue" — it carries the pipeline.CLAUDE.md"Core patterns" section updated.Notes for implementer
TidyIteminterface is settled when the pipeline starts producing rows. Without Implement git-worktree-tidy CLI #1, the pipeline either invents a parallel row type or rewrites it again later.scan.rsfiles, the orchestration must reappear somewhere — in one place, in core, behind one interface. The leverage is high: bug fixes to fetch sequencing or progress reporting land once and pay back across seven tools.Related
Part of the architectural review: