Skip to content

Chunk downloads during migration#1013

Open
ChrisSchinnerl wants to merge 4 commits into
masterfrom
chris/chunk
Open

Chunk downloads during migration#1013
ChrisSchinnerl wants to merge 4 commits into
masterfrom
chris/chunk

Conversation

@ChrisSchinnerl

@ChrisSchinnerl ChrisSchinnerl commented Jun 18, 2026

Copy link
Copy Markdown
Member

This PR implements chunking on migration downloads. Allowing an indexer to leverage 30 hosts instead of just 10 when downloading the sectors of a slab.

With this Zeus goes down to ~5s per slab download where previously we have seen quite a few 20-30s downloads. It really doesn't speed up overall throughput but allows us to be more efficient and go down to 64 workers instead of the 128 we previously had while achieving the same repair rate.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors slab migration recovery to use segment-aligned, chunked reads so recovery can fan out across more hosts in parallel, reducing end-to-end migration latency when full-slab downloads are the bottleneck.

Changes:

  • Replace whole-shard slab downloading with chunked recoverShards that decrypts and Reed-Solomon reconstructs required shards chunk-by-chunk across more hosts.
  • Add recoveryChunkSize configuration and adaptive “hedged” racing based on observed read throughput (ReadEstimate).
  • Update test helpers/mocks and add recovery-focused tests to validate byte-exact reconstruction and demotion behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
slabs/migrations.go Switch migration path to recover only required shards via chunked recovery and then re-encrypt for upload.
slabs/manager.go Add recovery chunk sizing configuration and extend HostClient with ReadEstimate.
slabs/manager_test.go Extend mock host client to satisfy the new ReadEstimate HostClient method.
slabs/export_test.go Update test exports to use RecoverShards and allow tests to adjust recovery chunk size.
slabs/downloads.go Implement chunked slab recovery, adaptive racing, host exclusion on lost sectors, and chunk reconstruction.
slabs/downloads_test.go Replace download tests with recovery tests validating correctness, error cases, racing, lost-sector handling, and demotion logic.
Comments suppressed due to low confidence (1)

slabs/migrations.go:102

  • The loop variable required shadows the required slice, which is easy to misread and makes future edits error-prone. Rename the boolean loop variable to avoid shadowing.
	for i, required := range required {
		if !required {
			shards[i] = nil
			continue
		}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread slabs/manager.go Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@ChrisSchinnerl ChrisSchinnerl marked this pull request as ready for review June 26, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants