diff --git a/CLAUDE.md b/CLAUDE.md index beed636..50fa8c9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -45,7 +45,7 @@ This is a Cargo workspace with a shared core library and seven binary crates. - **Scan pipeline seam** (`git-tidy-core/src/scan.rs`): Two layers. **Layer 1** `parallel_classify(repo_paths, classify_fn, label, progress)` runs a per-repo classifier closure in parallel (rayon) with progress + warning collection over any group type `G`; used directly by `git-repo-tidy`, `git-lfs-tidy`, and `git-config-tidy`. **Layer 2** `run_pipeline(...)` is the high-level seam for the five uniform tools (remote, tag, stash, branch, worktree): optional `parallel_fetch` (gated by `ScanOptions { fetch }`), `parallel_classify` dispatch wrapping each non-empty `Vec` into a `RepoGroup` (empty groups dropped), `Counts` aggregation via `Classified::classification_label`, and `ScanResult` assembly. Generic `RepoGroup`/`ScanResult` replace each uniform tool's former hand-rolled `*RepoGroup`/`*ScanResult` (exposed per tool as `pub type *ScanResult = ScanResult<*Info>`). The two exception tools keep bespoke results (repo-tidy flat with disk metrics + a cross-cutting `dirty` count; lfs-tidy with `LfsRepoGroup`/`lfs_installed`) and stay on Layer 1. - **Generic counts** (`git-tidy-core/src/counts.rs`): one `Counts` newtype over `BTreeMap` (`increment`/`get`/`total`/`iter`/`from_pairs`) replaces all per-tool `*Counts` structs and the `define_counts!` macro; keyed by `ClassificationLabel::label()` strings (also the audit-JSON keys). `output::write_summary_line` renders a tool's human summary from a `Counts` plus an ordered `&[(display_word, count_key)]` spec. - **Library-first design**: `scan.rs` and `clean.rs` are library functions; `main.rs` is thin dispatch. Enables `git-tidy` to call each tool's scan/lint as a library. -- **`CleanPipeline` seam** (`git-tidy-core/src/clean.rs`): `run_clean(items, decide, act, out) -> Result, Error>` owns the shared clean loop — iterate → pre-filter → act → aggregate `succeeded`/`failed`/`skipped` — exactly once. It is generic over the item type because five of seven tools classify on their own enums, not the shared `Classification`. Each tool supplies a `decide` predicate (the pure, silent classification filter — `Decision::Clean`/`Skip`; the per-tool `should_clean`) and an `act` closure that owns everything tool-specific: dry-run-vs-real branching and the exact "would …" wording, any IO guard (returning `Outcome::Skipped` after printing its own warning), the actual delete/prune, and construction of the success record. `act` returns `Outcome::{Cleaned(S), Skipped, Failed(FailedItem)}`; it returns `Err` only for genuinely fatal errors (an `out` write failure, a hard removal failure), which short-circuit the loop. Aggregate extras the pipeline doesn't model (e.g. repo-tidy's `dirty_blocked`) stay tool-side: `act` captures a `&mut` and the tool wraps the returned `CleanResult` (see `RepoCleanResult`). Migrated so far: `git-repo-tidy`. The other six tools still carry their own clean loops (issue #118 step 3). +- **`CleanPipeline` seam** (`git-tidy-core/src/clean.rs`): `run_clean(items, decide, act, out) -> Result, Error>` owns the shared clean loop — iterate → pre-filter → act → aggregate `succeeded`/`failed`/`skipped` — exactly once. It is generic over the item type because five of seven tools classify on their own enums, not the shared `Classification`. Each tool supplies a `decide` predicate (the pure, silent classification filter — `Decision::Clean`/`Skip`; the per-tool `should_clean`) and an `act` closure that owns everything tool-specific: dry-run-vs-real branching and the exact "would …" wording, any IO guard (returning `Outcome::Skipped` after printing its own warning), the actual delete/prune, and construction of the success record. `act` returns `Outcome::{Cleaned(S), Skipped, Failed(FailedItem)}`; it returns `Err` only for genuinely fatal errors (an `out` write failure, a hard removal failure), which short-circuit the loop. Aggregate extras the pipeline doesn't model (e.g. repo-tidy's `dirty_blocked`, lfs-tidy's per-group recommendations) stay tool-side: `act` captures a `&mut` (or the tool computes them around the call) and the tool wraps the returned `CleanResult` (see `RepoCleanResult`/`LfsCleanResult`). All seven tools now use this seam — each tool's `clean.rs` is a thin `decide`/`act` adapter over `run_clean`; branch- and worktree-tidy additionally share the `Classification`-typed `should_clean_landed` filter, and grouped tools call `run_clean` once per repo group (stash pre-sorts each group's items descending to preserve drop order). - **`CachingGitOps`** (`git-tidy-core/src/caching.rs`): `GitOps` wrapper that memoizes read-only queries (`fetch_prune`, `symbolic_ref_origin_head`, `rev_parse_verify`, `list_local_branches`, `list_remotes`, `ls_remote_check`, `list_builtin_commands`, `lfs_installed`, `log_grep`, `diff_commit`, `diff_commit_files`, `diff_commit_on_ref`) via `Mutex`. Used by the in-process audit runner to avoid redundant git calls across tools. A `delegate_git_ops!` macro forwards uncached methods to the inner `GitOps`. ### CLI pattern (shared `resolve_directory`, per-crate `clap` definitions) diff --git a/crates/git-lfs-tidy/src/clean.rs b/crates/git-lfs-tidy/src/clean.rs index 52ff61b..745b73c 100644 --- a/crates/git-lfs-tidy/src/clean.rs +++ b/crates/git-lfs-tidy/src/clean.rs @@ -2,12 +2,13 @@ use std::io::Write; use std::ops::Deref; use std::path::PathBuf; +use git_tidy_core::clean::{Decision, Outcome, run_clean as core_run_clean}; use git_tidy_core::error::Error; use git_tidy_core::git::GitOps; use git_tidy_core::types::{CleanResult, FailedItem}; use crate::output::format_bytes; -use crate::types::{LfsClassification, LfsScanResult}; +use crate::types::{LfsClassification, LfsInfo, LfsScanResult}; /// Options controlling LFS cleanup behavior. pub struct CleanOptions { @@ -28,7 +29,8 @@ pub struct LfsCleanResult { pub result: CleanResult, /// Recommendations printed to the user (e.g. `git lfs migrate`). Already /// emitted inline during the pass; retained here for tests and future JSON - /// output. A candidate to drop once lfs moves onto the shared CleanPipeline. + /// output. Computed tool-side around the per-group `run_clean` call, since the + /// shared pipeline does not model per-group aggregate extras. #[allow(dead_code)] pub recommendations: Vec, } @@ -52,92 +54,121 @@ pub struct PrunedRepo { pub dry_run: bool, } +/// Determine if an LFS item should be acted on by `act`. +/// +/// Only orphaned objects are prunable, and only when `--prune` is set. Every +/// other classification (untracked, missing, healthy) is skipped here; the +/// pipeline counts the rejection as a skip, matching the historical +/// `skipped += 1; continue;`. Untracked/missing items still drive per-group +/// recommendations, which are detected tool-side around the per-group call. +fn should_clean(item: &LfsInfo, options: &CleanOptions) -> bool { + matches!(item.classification, LfsClassification::Orphaned) && options.prune +} + /// Run the clean operation on a scan result. +/// +/// lfs groups items by repo. We call the shared [`core_run_clean`] loop once per +/// group over that group's items, accumulating each group's [`CleanResult`] into +/// a single aggregate. The per-group health recommendations are not modeled by the +/// pipeline, so they stay tool-side: we detect `has_untracked` / `has_missing` +/// from the group's classifications and emit the recommendation lines *after* that +/// group's `core_run_clean` call. This keeps a group's prune lines, then its +/// recommendation lines, contiguous before the next group is processed. pub fn run_clean( git: &dyn GitOps, scan_result: &LfsScanResult, options: &CleanOptions, out: &mut dyn Write, ) -> Result { - let mut pruned = Vec::new(); + let mut succeeded = Vec::new(); let mut failed = Vec::new(); - let mut skipped = 0; + let mut skipped = 0usize; let mut recommendations = Vec::new(); for group in &scan_result.repos { - let mut has_untracked = false; - let mut has_missing = false; - - for item in &group.items { - match item.classification { - LfsClassification::Orphaned if options.prune => { - let count = item - .path - .trim_start_matches('<') - .split_once(' ') - .and_then(|(n, _)| n.parse::().ok()) - .unwrap_or(0); - let bytes = item.size_bytes.unwrap_or(0); - - if options.dry_run { + // The seam owns iterate/filter/act/aggregate for this group's items; the + // per-group recommendations below are the aggregate extra it doesn't model. + let result = core_run_clean( + &group.items, + |item| { + if should_clean(item, options) { + Decision::Clean + } else { + Decision::Skip + } + }, + |item, out| { + let count = item + .path + .trim_start_matches('<') + .split_once(' ') + .and_then(|(n, _)| n.parse::().ok()) + .unwrap_or(0); + let bytes = item.size_bytes.unwrap_or(0); + + if options.dry_run { + writeln!( + out, + "would prune {} orphaned LFS objects ({}) in {}", + count, + format_bytes(bytes), + group.name, + )?; + return Ok(Outcome::Cleaned(PrunedRepo { + repo: group.repo_path.clone(), + objects_pruned: count, + bytes_freed: bytes, + dry_run: true, + })); + } + + match git.lfs_prune(&group.repo_path) { + Ok(()) => { writeln!( out, - "would prune {} orphaned LFS objects ({}) in {}", + "pruned {} orphaned LFS objects ({}) in {}", count, format_bytes(bytes), group.name, )?; - pruned.push(PrunedRepo { + Ok(Outcome::Cleaned(PrunedRepo { repo: group.repo_path.clone(), objects_pruned: count, bytes_freed: bytes, - dry_run: true, - }); - } else { - match git.lfs_prune(&group.repo_path) { - Ok(()) => { - writeln!( - out, - "pruned {} orphaned LFS objects ({}) in {}", - count, - format_bytes(bytes), - group.name, - )?; - pruned.push(PrunedRepo { - repo: group.repo_path.clone(), - objects_pruned: count, - bytes_freed: bytes, - dry_run: false, - }); - } - Err(e) => { - writeln!( - out, - "error: could not prune LFS objects in {}: {e}", - group.name, - )?; - failed.push(FailedItem { - repo: group.repo_path.clone(), - name: group.name.clone(), - reason: e.to_string(), - }); - } - } + dry_run: false, + })) + } + Err(e) => { + writeln!( + out, + "error: could not prune LFS objects in {}: {e}", + group.name, + )?; + Ok(Outcome::Failed(FailedItem { + repo: group.repo_path.clone(), + name: group.name.clone(), + reason: e.to_string(), + })) } } - LfsClassification::Untracked => { - has_untracked = true; - skipped += 1; - } - LfsClassification::Missing => { - has_missing = true; - skipped += 1; - } - _ => { - skipped += 1; - } - } - } + }, + out, + )?; + + succeeded.extend(result.succeeded); + failed.extend(result.failed); + skipped += result.skipped; + + // Per-group recommendations: emitted after this group's prune lines so a + // group's output stays contiguous, then collected for tests/JSON. + let has_untracked = group + .items + .iter() + .any(|i| i.classification == LfsClassification::Untracked); + let has_missing = group + .items + .iter() + .any(|i| i.classification == LfsClassification::Missing); if has_untracked { let msg = format!( @@ -159,7 +190,7 @@ pub fn run_clean( Ok(LfsCleanResult { result: CleanResult { - succeeded: pruned, + succeeded, failed, skipped, },