Skip to content

[Feature] @kb_transformation decorator for Knowledge Base custom transformation Lambdas #488

@jariy17

Description

@jariy17

Summary

Knowledge Bases supports custom transformation Lambda functions that run during ingestion to apply custom chunking or metadata enrichment. Today, customers must manually parse the KB event format, read/write content batches from S3, and return the exact expected response structure — significant boilerplate that obscures the actual transformation logic.

The SDK should provide a @kb_transformation decorator that handles all the plumbing (S3 I/O, event parsing, batch iteration, and response formatting) so the customer only writes their transformation logic.

Proposed API

from bedrock_agentcore.knowledge_base import kb_transformation

@kb_transformation
def my_chunker(content: str, metadata: dict) -> list[dict]:
    """Custom chunking — split on headings and enrich metadata."""
    chunks = content.split("\n# ")
    return [
        {"contentBody": chunk, "contentMetadata": {"section": i}}
        for i, chunk in enumerate(chunks)
    ]

The decorator would:

  1. Parse the Lambda event — extract the S3 bucket/key for the input content batch and the output location
  2. Handle S3 I/O — read content objects from S3, write transformed output back to S3
  3. Iterate over batches — call the user function once per document/content item in the batch
  4. Format the response — return the exact structure the Knowledge Bases ingestion pipeline expects

Current pain points (without the decorator)

  • Customers must understand and implement the undocumented event schema for the transformation Lambda
  • S3 read/write logic (including error handling, content-type detection) must be implemented from scratch
  • Batch iteration and response assembly is repetitive boilerplate across every transformation Lambda
  • Small mistakes in the response structure cause silent ingestion failures that are hard to debug

Expected behavior

  • The decorator should accept a function with signature (content: str, metadata: dict) -> list[dict] at minimum
  • Each dict in the return list represents a chunk with at least contentBody (str) and optionally contentMetadata (dict)
  • The decorator should handle all S3 operations, event parsing, and response formatting transparently
  • Errors in the user function should surface clearly rather than being swallowed by the S3/response plumbing
  • Should work as a standard AWS Lambda handler (compatible with lambda_handler entry point patterns)

Additional considerations

  • Should the decorator support async transformation functions?
  • Should there be a lower-level variant that gives access to the raw batch (for cases where per-document iteration isn't desired)?
  • Consider exposing the source document metadata (filename, content type, data source ID) to the user function for context-aware transformations

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions