Skip to content

fix: drain BatchGetItem paginator in async batchLoad to retry UnprocessedKeys#253

Merged
ThePumpingLemma merged 1 commit into
cashapp:mainfrom
polyatail:roo/async-batchload-unprocessedkeys
May 12, 2026
Merged

fix: drain BatchGetItem paginator in async batchLoad to retry UnprocessedKeys#253
ThePumpingLemma merged 1 commit into
cashapp:mainfrom
polyatail:roo/async-batchload-unprocessedkeys

Conversation

@polyatail

Copy link
Copy Markdown
Contributor

Summary

AsyncLogicalDb.batchLoad (and batchLoadAsync) silently drops UnprocessedKeys returned by DynamoDB under throttling or partial-result conditions, contradicting the KDoc:

A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items. BatchGetItem returns a partial result if the response size limit is exceeded, the table's provisioned throughput is exceeded, or an internal processing failure occurs. If a partial result is returned, this method backs off and retries the UnprocessedKeys in the next API call.

The implementation called dynamoDbEnhancedClient.batchGetItem(request).limit(1).asFlow(). batchGetItem returns a BatchGetResultPagePublisher whose auto-paginator emits one BatchGetResultPage per HTTP round-trip and continues issuing follow-up BatchGetItem calls as long as the previous response had UnprocessedKeys. The .limit(1) truncated this to the first page — so any keys returned in UnprocessedKeys on that first response were never retried, never decoded, and the caller never found out.

Callers couldn't distinguish "key didn't exist" from "throttled and silently dropped." The sync LogicalDb.batchLoad path was already correct (it drains the full paginator iterator) and was independently fixed in #176; the async path was missed.

This PR drops .limit(1) and lets the paginator drain naturally. The suspend wrapper in AsyncLogicalDb.batchLoad already accumulates across multiple ItemSet emissions via reduce { acc, item -> ItemSet(acc.getAllItems() + item.getAllItems()) }, so removing .limit(1) composes correctly downstream.

History

The .limit(1) originated in #154, which converted batchLoadAsync from a single-batch publisher (batchGetItem(batchRequests.first()).limit(1)) into chunked flows. The .limit(1) was carried forward from the original single-batch shape without re-evaluating against the SDK's paginator semantics.

Test plan

  • Added batchLoad more than 16MB test in AsyncLogicalDbBatchTest, mirroring the existing sync test in LogicalDbBatchTest. Generates ~20MB with 200KB descriptions to force the SDK paginator to issue follow-up BatchGetItem calls for UnprocessedKeys.
  • Verified the new test fails on main (with .limit(1) in place) and passes with the fix.
  • Full :tempest2:test suite passes locally.

🤖 Generated with Claude Code

…ssedKeys

The async path called `batchGetItem(request).limit(1).asFlow()`, which
truncated the SDK auto-paginator to its first emission. Any keys returned
in `UnprocessedKeys` (under throttling or on partial responses) were
silently dropped — contradicting the KDoc on `AsyncLogicalDb.batchLoad`
which promises that `UnprocessedKeys` are retried in follow-up calls.

The sync path was already fixed in cashapp#176; this brings the async path to
parity. Adds a `batchLoad more than 16MB` test that mirrors the sync
suite and exercises the multi-page case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@ThePumpingLemma ThePumpingLemma merged commit ddc47c5 into cashapp:main May 12, 2026
5 checks passed
@polyatail polyatail deleted the roo/async-batchload-unprocessedkeys branch May 13, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants