Summary
Knowledge Bases supports custom transformation Lambda functions that run during ingestion to apply custom chunking or metadata enrichment. Today, customers must manually parse the KB event format, read/write content batches from S3, and return the exact expected response structure — significant boilerplate that obscures the actual transformation logic.
The SDK should provide a @kb_transformation decorator that handles all the plumbing (S3 I/O, event parsing, batch iteration, and response formatting) so the customer only writes their transformation logic.
Proposed API
from bedrock_agentcore.knowledge_base import kb_transformation
@kb_transformation
def my_chunker(content: str, metadata: dict) -> list[dict]:
"""Custom chunking — split on headings and enrich metadata."""
chunks = content.split("\n# ")
return [
{"contentBody": chunk, "contentMetadata": {"section": i}}
for i, chunk in enumerate(chunks)
]
The decorator would:
- Parse the Lambda event — extract the S3 bucket/key for the input content batch and the output location
- Handle S3 I/O — read content objects from S3, write transformed output back to S3
- Iterate over batches — call the user function once per document/content item in the batch
- Format the response — return the exact structure the Knowledge Bases ingestion pipeline expects
Current pain points (without the decorator)
- Customers must understand and implement the undocumented event schema for the transformation Lambda
- S3 read/write logic (including error handling, content-type detection) must be implemented from scratch
- Batch iteration and response assembly is repetitive boilerplate across every transformation Lambda
- Small mistakes in the response structure cause silent ingestion failures that are hard to debug
Expected behavior
- The decorator should accept a function with signature
(content: str, metadata: dict) -> list[dict] at minimum
- Each dict in the return list represents a chunk with at least
contentBody (str) and optionally contentMetadata (dict)
- The decorator should handle all S3 operations, event parsing, and response formatting transparently
- Errors in the user function should surface clearly rather than being swallowed by the S3/response plumbing
- Should work as a standard AWS Lambda handler (compatible with
lambda_handler entry point patterns)
Additional considerations
- Should the decorator support async transformation functions?
- Should there be a lower-level variant that gives access to the raw batch (for cases where per-document iteration isn't desired)?
- Consider exposing the source document metadata (filename, content type, data source ID) to the user function for context-aware transformations
References
Summary
Knowledge Bases supports custom transformation Lambda functions that run during ingestion to apply custom chunking or metadata enrichment. Today, customers must manually parse the KB event format, read/write content batches from S3, and return the exact expected response structure — significant boilerplate that obscures the actual transformation logic.
The SDK should provide a
@kb_transformationdecorator that handles all the plumbing (S3 I/O, event parsing, batch iteration, and response formatting) so the customer only writes their transformation logic.Proposed API
The decorator would:
Current pain points (without the decorator)
Expected behavior
(content: str, metadata: dict) -> list[dict]at minimumcontentBody(str) and optionallycontentMetadata(dict)lambda_handlerentry point patterns)Additional considerations
References