Skip to content

[SS] Add a memory snapshot export monitor#12484

Open
maggie-lou wants to merge 2 commits into
masterfrom
write_monitor
Open

[SS] Add a memory snapshot export monitor#12484
maggie-lou wants to merge 2 commits into
masterfrom
write_monitor

Conversation

@maggie-lou

@maggie-lou maggie-lou commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Replacement for this PR. Rather than interleaving the logic into copy_on_write store, adds it to the firecracker package

@maggie-lou maggie-lou marked this pull request as ready for review June 17, 2026 20:21

@buildbuddy-io buildbuddy-io Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds two not-yet-wired building-block types for caching Firecracker memory-snapshot chunks during export: sequentialChunkWriteTracker and memorySnapshotExportMonitor. The design is clean and tests are readable, but there is a high-severity data race / dropped-final-chunk bug in Finish() that should be addressed (or covered by a test) before wiring to a real COWStore; remaining findings are non-blocking.

@maggie-lou maggie-lou requested review from bduffany and vanja-p June 17, 2026 22:17
// chunk as soon as writes move on to a later chunk, while the snapshot data is still in the page cache.
// This removes the need for another pass through the COWStore to cache chunks after the snapshot is generated,
// reducing memory and IO from processing the large snapshots twice.
type memorySnapshotExportMonitor struct {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"monitor" sounds like this is for observability but it's really for caching. maybe call it something like "memorySnapshotCacher" or "memorySnapshotCacheWriter"?

Comment on lines +118 to +119
log.CtxWarningf(m.egCtx, "Failed to finalize sequential chunk write tracker: %s", err)
return err

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we should either log an error or return the error, but not both. Otherwise we'll wind up polluting the logs.

If the logging was meant to add additional context (e.g. the error is happening due to "finalizing"), then you can instead use status.WrapError to add that extra context.

Comment on lines +14 to +15
"memory_snapshot_export_monitor.go",
"sequential_chunk_write_monitor.go",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it seems a tad overkill to add two separate files for this - maybe consider putting both in a single file since the functionality is very closely related and the two things together are fairly small

}

func (m *memorySnapshotExportMonitor) startUploadWorkers(uploadConcurrency int) {
for i := 0; i < uploadConcurrency; i++ {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be for range i


// queueChunkUpload schedules caching for a chunk that the sequential tracker has
// finalized.
func (m *memorySnapshotExportMonitor) queueChunkUpload(chunkOffset int64) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this always returns nil, so maybe it doesn't need to return an error?

If it didn't return an error, than sequentialChunkWriteTracker.Finish also wouldn't need to return an error.

Comment on lines +53 to +55
if chunkSizeBytes <= 0 {
return status.FailedPreconditionErrorf("chunk size must be positive: %d", chunkSizeBytes)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: chunkSizeBytes is the same on every iteration, but checked every time.

chunkSizeBytes int64

// finalizeChunk is called when no more writes are expected for the given chunk.
finalizeChunk func(chunkOffset int64) error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering that memorySnapshotExportMonitor.OnWrite is a callback, adding another layer of callback is a bit tricky to understand. Two possible options for simplification:

  1. Instead of passing in a callback, just pass in the chan int64
  2. Instead of callback, make Observe and Finish return the chunk offset that should be written, if any.

Just food for thought. I'm not sure that either option is actually better.

}

func (m *memorySnapshotExportMonitor) run() error {
for {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about inlining the tracker implementation into this function?

It would just need currentChunkIndex and haveCurrentChunk variables, and Observe and Finish would be inlined. validateSequentialWriteEvent could still be a separate helper function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants