Skip to content

sim: return removed chunk hashes from remove_shard_dedup_entries#878

Open
sirahd wants to merge 1 commit into
mainfrom
sirahd/sim-dedup-return-removed-chunks
Open

sim: return removed chunk hashes from remove_shard_dedup_entries#878
sirahd wants to merge 1 commit into
mainfrom
sirahd/sim-dedup-return-removed-chunks

Conversation

@sirahd

@sirahd sirahd commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

DeletionControlableClient::remove_shard_dedup_entries previously returned Result<()>, discarding which global-dedup keys it reclaimed. The production CAS gc_delete_shard_dedup endpoint already returns the removed chunk hashes; this brings the simulation clients in line so callers (GC Stage 4's dedup-removal audit) can observe the same data through the simulation harness.

Changes

  • Trait (deletion_controls.rs): remove_shard_dedup_entries(&self, shard_hash) -> Result<Vec<MerkleHash>>.
  • LocalClient: accumulates the GLOBAL_DEDUP_TABLE keys it deletes (across the retry passes) and returns them.
  • MemoryClient: collects global_dedup keys before clear() (the in-memory client holds a single shard, so all entries belong to it).
  • HTTP path: the local-server route serializes the list as Json(Vec<HexMerkleHash>); SimulationControlClient parses it back into Vec<MerkleHash>.
  • Tests: the shared test_remove_shard_dedup_entries_* cases now assert the returned list reports the target shard's chunks and excludes other shards'; the no-op case asserts an empty list. The SimulationControlClient suite + the builder-wired server test cover the full HTTP serialize→parse round-trip.

Motivation / downstream

Consumed by xet-garbage-collection Stage 4, which collects removed chunk hashes into an audit artifact (GC PR #102). That PR's SimulationConnector currently returns an empty list because this trait didn't surface it; once this lands and GC bumps its pinned xet-core rev, the simulation path can carry the real chunk list end-to-end.

Testing

  • cargo build -p xet-client --features simulation — clean.
  • cargo test -p xet-client --features simulation deletion — 7 passed (LocalClient, memory backend, and SimulationControlClient HTTP suite).
  • cargo clippy -r -p xet-client --features simulation -- -D warnings — clean. (Pre-existing --all-targets test-code lints in hub_client/client.rs and chunk_cache/disk.rs are untouched here.)

🤖 Generated with Claude Code


Note

Low Risk
Changes are confined to the simulation feature and deletion-control APIs; behavior for unknown shards remains a no-op with an empty list, with expanded test coverage.

Overview
remove_shard_dedup_entries now returns Vec<MerkleHash> of reclaimed global-dedup chunk keys, aligned with production gc_delete_shard_dedup, so GC simulation can audit deregistered dedup entries.

The DeletionControlableClient trait and all simulation backends (LocalClient, MemoryClient) collect and return the keys they delete. The local-server DELETE /simulation/shards/{hash}/dedup_entries route responds with Json(Vec<HexMerkleHash>) instead of an empty body; SimulationControlClient deserializes that list. Shared deletion tests and the builder-wired HTTP test assert the returned set matches the target shard and is empty for unknown shards.

Reviewed by Cursor Bugbot for commit 552fd56. Bugbot is set up for automated code reviews on this repo. Configure here.

The DeletionControlableClient::remove_shard_dedup_entries operation
discarded which dedup keys it reclaimed. Return the removed chunk hashes
so callers (GC Stage 4) can audit them, mirroring the production CAS
gc_delete_shard_dedup endpoint that already returns this list.

- trait + LocalClient (collect deleted GLOBAL_DEDUP_TABLE keys) + MemoryClient
  (collect global_dedup keys before clear) now return Vec<MerkleHash>
- local-server HTTP route serializes the list as JSON (Vec<HexMerkleHash>);
  SimulationControlClient parses it back
- shared deletion tests assert the returned list reports the target shard's
  chunks and excludes others; the server suite covers the HTTP round-trip

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sirahd sirahd requested review from assafvayner, hoytak and seanses June 17, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants