Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ This is a Cargo workspace with a shared core library and seven binary crates.
- **Scan pipeline seam** (`git-tidy-core/src/scan.rs`): Two layers. **Layer 1** `parallel_classify<G>(repo_paths, classify_fn, label, progress)` runs a per-repo classifier closure in parallel (rayon) with progress + warning collection over any group type `G`; used directly by `git-repo-tidy`, `git-lfs-tidy`, and `git-config-tidy`. **Layer 2** `run_pipeline<T: Classified + Send>(...)` is the high-level seam for the five uniform tools (remote, tag, stash, branch, worktree): optional `parallel_fetch` (gated by `ScanOptions { fetch }`), `parallel_classify` dispatch wrapping each non-empty `Vec<T>` into a `RepoGroup<T>` (empty groups dropped), `Counts` aggregation via `Classified::classification_label`, and `ScanResult<T>` assembly. Generic `RepoGroup<T>`/`ScanResult<T>` replace each uniform tool's former hand-rolled `*RepoGroup`/`*ScanResult` (exposed per tool as `pub type *ScanResult = ScanResult<*Info>`). The two exception tools keep bespoke results (repo-tidy flat with disk metrics + a cross-cutting `dirty` count; lfs-tidy with `LfsRepoGroup`/`lfs_installed`) and stay on Layer 1.
- **Generic counts** (`git-tidy-core/src/counts.rs`): one `Counts` newtype over `BTreeMap<String, usize>` (`increment`/`get`/`total`/`iter`/`from_pairs`) replaces all per-tool `*Counts` structs and the `define_counts!` macro; keyed by `ClassificationLabel::label()` strings (also the audit-JSON keys). `output::write_summary_line` renders a tool's human summary from a `Counts` plus an ordered `&[(display_word, count_key)]` spec.
- **Library-first design**: `scan.rs` and `clean.rs` are library functions; `main.rs` is thin dispatch. Enables `git-tidy` to call each tool's scan/lint as a library.
- **`CleanPipeline` seam** (`git-tidy-core/src/clean.rs`): `run_clean<I, S>(items, decide, act, out) -> Result<CleanResult<S>, Error>` owns the shared clean loop — iterate → pre-filter → act → aggregate `succeeded`/`failed`/`skipped` — exactly once. It is generic over the item type because five of seven tools classify on their own enums, not the shared `Classification`. Each tool supplies a `decide` predicate (the pure, silent classification filter — `Decision::Clean`/`Skip`; the per-tool `should_clean`) and an `act` closure that owns everything tool-specific: dry-run-vs-real branching and the exact "would …" wording, any IO guard (returning `Outcome::Skipped` after printing its own warning), the actual delete/prune, and construction of the success record. `act` returns `Outcome::{Cleaned(S), Skipped, Failed(FailedItem)}`; it returns `Err` only for genuinely fatal errors (an `out` write failure, a hard removal failure), which short-circuit the loop. Aggregate extras the pipeline doesn't model (e.g. repo-tidy's `dirty_blocked`) stay tool-side: `act` captures a `&mut` and the tool wraps the returned `CleanResult<S>` (see `RepoCleanResult`). Migrated so far: `git-repo-tidy`. The other six tools still carry their own clean loops (issue #118 step 3).
- **`CleanPipeline` seam** (`git-tidy-core/src/clean.rs`): `run_clean<I, S>(items, decide, act, out) -> Result<CleanResult<S>, Error>` owns the shared clean loop — iterate → pre-filter → act → aggregate `succeeded`/`failed`/`skipped` — exactly once. It is generic over the item type because five of seven tools classify on their own enums, not the shared `Classification`. Each tool supplies a `decide` predicate (the pure, silent classification filter — `Decision::Clean`/`Skip`; the per-tool `should_clean`) and an `act` closure that owns everything tool-specific: dry-run-vs-real branching and the exact "would …" wording, any IO guard (returning `Outcome::Skipped` after printing its own warning), the actual delete/prune, and construction of the success record. `act` returns `Outcome::{Cleaned(S), Skipped, Failed(FailedItem)}`; it returns `Err` only for genuinely fatal errors (an `out` write failure, a hard removal failure), which short-circuit the loop. Aggregate extras the pipeline doesn't model (e.g. repo-tidy's `dirty_blocked`, lfs-tidy's per-group recommendations) stay tool-side: `act` captures a `&mut` (or the tool computes them around the call) and the tool wraps the returned `CleanResult<S>` (see `RepoCleanResult`/`LfsCleanResult`). All seven tools now use this seam — each tool's `clean.rs` is a thin `decide`/`act` adapter over `run_clean`; branch- and worktree-tidy additionally share the `Classification`-typed `should_clean_landed` filter, and grouped tools call `run_clean` once per repo group (stash pre-sorts each group's items descending to preserve drop order).
- **`CachingGitOps`** (`git-tidy-core/src/caching.rs`): `GitOps` wrapper that memoizes read-only queries (`fetch_prune`, `symbolic_ref_origin_head`, `rev_parse_verify`, `list_local_branches`, `list_remotes`, `ls_remote_check`, `list_builtin_commands`, `lfs_installed`, `log_grep`, `diff_commit`, `diff_commit_files`, `diff_commit_on_ref`) via `Mutex<HashMap>`. Used by the in-process audit runner to avoid redundant git calls across tools. A `delegate_git_ops!` macro forwards uncached methods to the inner `GitOps`.

### CLI pattern (shared `resolve_directory`, per-crate `clap` definitions)
Expand Down
165 changes: 98 additions & 67 deletions crates/git-lfs-tidy/src/clean.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@ use std::io::Write;
use std::ops::Deref;
use std::path::PathBuf;

use git_tidy_core::clean::{Decision, Outcome, run_clean as core_run_clean};
use git_tidy_core::error::Error;
use git_tidy_core::git::GitOps;
use git_tidy_core::types::{CleanResult, FailedItem};

use crate::output::format_bytes;
use crate::types::{LfsClassification, LfsScanResult};
use crate::types::{LfsClassification, LfsInfo, LfsScanResult};

/// Options controlling LFS cleanup behavior.
pub struct CleanOptions {
Expand All @@ -28,7 +29,8 @@ pub struct LfsCleanResult {
pub result: CleanResult<PrunedRepo>,
/// Recommendations printed to the user (e.g. `git lfs migrate`). Already
/// emitted inline during the pass; retained here for tests and future JSON
/// output. A candidate to drop once lfs moves onto the shared CleanPipeline.
/// output. Computed tool-side around the per-group `run_clean` call, since the
/// shared pipeline does not model per-group aggregate extras.
#[allow(dead_code)]
pub recommendations: Vec<String>,
}
Expand All @@ -52,92 +54,121 @@ pub struct PrunedRepo {
pub dry_run: bool,
}

/// Determine if an LFS item should be acted on by `act`.
///
/// Only orphaned objects are prunable, and only when `--prune` is set. Every
/// other classification (untracked, missing, healthy) is skipped here; the
/// pipeline counts the rejection as a skip, matching the historical
/// `skipped += 1; continue;`. Untracked/missing items still drive per-group
/// recommendations, which are detected tool-side around the per-group call.
fn should_clean(item: &LfsInfo, options: &CleanOptions) -> bool {
matches!(item.classification, LfsClassification::Orphaned) && options.prune
}

/// Run the clean operation on a scan result.
///
/// lfs groups items by repo. We call the shared [`core_run_clean`] loop once per
/// group over that group's items, accumulating each group's [`CleanResult`] into
/// a single aggregate. The per-group health recommendations are not modeled by the
/// pipeline, so they stay tool-side: we detect `has_untracked` / `has_missing`
/// from the group's classifications and emit the recommendation lines *after* that
/// group's `core_run_clean` call. This keeps a group's prune lines, then its
/// recommendation lines, contiguous before the next group is processed.
pub fn run_clean(
git: &dyn GitOps,
scan_result: &LfsScanResult,
options: &CleanOptions,
out: &mut dyn Write,
) -> Result<LfsCleanResult, Error> {
let mut pruned = Vec::new();
let mut succeeded = Vec::new();
let mut failed = Vec::new();
let mut skipped = 0;
let mut skipped = 0usize;
let mut recommendations = Vec::new();

for group in &scan_result.repos {
let mut has_untracked = false;
let mut has_missing = false;

for item in &group.items {
match item.classification {
LfsClassification::Orphaned if options.prune => {
let count = item
.path
.trim_start_matches('<')
.split_once(' ')
.and_then(|(n, _)| n.parse::<usize>().ok())
.unwrap_or(0);
let bytes = item.size_bytes.unwrap_or(0);

if options.dry_run {
// The seam owns iterate/filter/act/aggregate for this group's items; the
// per-group recommendations below are the aggregate extra it doesn't model.
let result = core_run_clean(
&group.items,
|item| {
if should_clean(item, options) {
Decision::Clean
} else {
Decision::Skip
}
},
|item, out| {
let count = item
.path
.trim_start_matches('<')
.split_once(' ')
.and_then(|(n, _)| n.parse::<usize>().ok())
.unwrap_or(0);
let bytes = item.size_bytes.unwrap_or(0);

if options.dry_run {
writeln!(
out,
"would prune {} orphaned LFS objects ({}) in {}",
count,
format_bytes(bytes),
group.name,
)?;
return Ok(Outcome::Cleaned(PrunedRepo {
repo: group.repo_path.clone(),
objects_pruned: count,
bytes_freed: bytes,
dry_run: true,
}));
}

match git.lfs_prune(&group.repo_path) {
Ok(()) => {
writeln!(
out,
"would prune {} orphaned LFS objects ({}) in {}",
"pruned {} orphaned LFS objects ({}) in {}",
count,
format_bytes(bytes),
group.name,
)?;
pruned.push(PrunedRepo {
Ok(Outcome::Cleaned(PrunedRepo {
repo: group.repo_path.clone(),
objects_pruned: count,
bytes_freed: bytes,
dry_run: true,
});
} else {
match git.lfs_prune(&group.repo_path) {
Ok(()) => {
writeln!(
out,
"pruned {} orphaned LFS objects ({}) in {}",
count,
format_bytes(bytes),
group.name,
)?;
pruned.push(PrunedRepo {
repo: group.repo_path.clone(),
objects_pruned: count,
bytes_freed: bytes,
dry_run: false,
});
}
Err(e) => {
writeln!(
out,
"error: could not prune LFS objects in {}: {e}",
group.name,
)?;
failed.push(FailedItem {
repo: group.repo_path.clone(),
name: group.name.clone(),
reason: e.to_string(),
});
}
}
dry_run: false,
}))
}
Err(e) => {
writeln!(
out,
"error: could not prune LFS objects in {}: {e}",
group.name,
)?;
Ok(Outcome::Failed(FailedItem {
repo: group.repo_path.clone(),
name: group.name.clone(),
reason: e.to_string(),
}))
}
}
LfsClassification::Untracked => {
has_untracked = true;
skipped += 1;
}
LfsClassification::Missing => {
has_missing = true;
skipped += 1;
}
_ => {
skipped += 1;
}
}
}
},
out,
)?;

succeeded.extend(result.succeeded);
failed.extend(result.failed);
skipped += result.skipped;

// Per-group recommendations: emitted after this group's prune lines so a
// group's output stays contiguous, then collected for tests/JSON.
let has_untracked = group
.items
.iter()
.any(|i| i.classification == LfsClassification::Untracked);
let has_missing = group
.items
.iter()
.any(|i| i.classification == LfsClassification::Missing);

if has_untracked {
let msg = format!(
Expand All @@ -159,7 +190,7 @@ pub fn run_clean(

Ok(LfsCleanResult {
result: CleanResult {
succeeded: pruned,
succeeded,
failed,
skipped,
},
Expand Down