Chunk downloads during migration#1013
Open
ChrisSchinnerl wants to merge 4 commits into
Open
Conversation
d573b10 to
be6760a
Compare
be6760a to
fbc3b25
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors slab migration recovery to use segment-aligned, chunked reads so recovery can fan out across more hosts in parallel, reducing end-to-end migration latency when full-slab downloads are the bottleneck.
Changes:
- Replace whole-shard slab downloading with chunked
recoverShardsthat decrypts and Reed-Solomon reconstructs required shards chunk-by-chunk across more hosts. - Add
recoveryChunkSizeconfiguration and adaptive “hedged” racing based on observed read throughput (ReadEstimate). - Update test helpers/mocks and add recovery-focused tests to validate byte-exact reconstruction and demotion behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| slabs/migrations.go | Switch migration path to recover only required shards via chunked recovery and then re-encrypt for upload. |
| slabs/manager.go | Add recovery chunk sizing configuration and extend HostClient with ReadEstimate. |
| slabs/manager_test.go | Extend mock host client to satisfy the new ReadEstimate HostClient method. |
| slabs/export_test.go | Update test exports to use RecoverShards and allow tests to adjust recovery chunk size. |
| slabs/downloads.go | Implement chunked slab recovery, adaptive racing, host exclusion on lost sectors, and chunk reconstruction. |
| slabs/downloads_test.go | Replace download tests with recovery tests validating correctness, error cases, racing, lost-sector handling, and demotion logic. |
Comments suppressed due to low confidence (1)
slabs/migrations.go:102
- The loop variable
requiredshadows therequiredslice, which is easy to misread and makes future edits error-prone. Rename the boolean loop variable to avoid shadowing.
for i, required := range required {
if !required {
shards[i] = nil
continue
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
chris124567
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements chunking on migration downloads. Allowing an indexer to leverage 30 hosts instead of just 10 when downloading the sectors of a slab.
With this Zeus goes down to ~5s per slab download where previously we have seen quite a few 20-30s downloads. It really doesn't speed up overall throughput but allows us to be more efficient and go down to 64 workers instead of the 128 we previously had while achieving the same repair rate.