feat(eventrecorder): add Kafka output#5246
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (7)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (6)
📝 WalkthroughWalkthroughAdds Kafka output to the event recorder: docs and config schema, YAML parsing/validation, KafkaOutput implemented with franz-go (bounded enqueue, async produce, error classification, Close draining), proto fast-path integration, metrics, and tests. ChangesKafka Event Recorder Output
Sequence Diagram(s)sequenceDiagram
participant EventRecorder
participant KafkaOutput
participant Dispatcher
participant FranzGo as Franz-Go Client
participant Kafka
EventRecorder->>KafkaOutput: SendProto(event)
KafkaOutput->>KafkaOutput: enqueue(marshaled bytes)
alt buffer full
KafkaOutput->>KafkaOutput: increment drops counter
else buffer has space
KafkaOutput->>Dispatcher: signal work available
end
Dispatcher->>FranzGo: Produce record keyed by instance
FranzGo->>Kafka: produce to topic with acks/compression
Kafka-->>FranzGo: callback (error or success)
FranzGo->>Dispatcher: update metrics (errors or success)
EventRecorder->>KafkaOutput: Close()
KafkaOutput->>Dispatcher: stop signal
Dispatcher->>FranzGo: drain queued records
FranzGo->>Kafka: flush within budget
FranzGo-->>KafkaOutput: flush result
KafkaOutput->>EventRecorder: closed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/configuration.md`:
- Line 2098: Replace the inconsistent placeholder `<secret_url>` with the
documented placeholder `<secret>` in the configuration examples (specifically
the webhook URL entry shown as `url: <secret_url>`); locate the occurrence of
`<secret_url>` in the docs/configuration.md snippet and update it to `url:
<secret>` so it matches the schema placeholder `<secret>` used elsewhere in the
document.
In `@eventrecorder/kafka.go`:
- Around line 151-156: The synchronous startup call to client.Ping using
pingCtx/defaultKafkaPingTimeout blocks initialization; remove the blocking ping
and instead spawn a background goroutine that performs the ping with its own
context.WithTimeout, calls client.Ping inside the goroutine, logs the same
warning (using logger.Warn, "output", name, "err", pingErr) on failure, and
ensures the context cancel is called inside the goroutine to avoid leaks; keep
the main init path non-blocking and do not call cancel() immediately from the
main thread.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7d611017-d7d4-4c31-a56b-55a1f37efc7d
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (7)
docs/configuration.mdeventrecorder/config.goeventrecorder/eventrecorder.goeventrecorder/eventrecorder_test.goeventrecorder/kafka.goeventrecorder/kafka_test.gogo.mod
72da8ae to
590738d
Compare
Document the event recorder configuration including all supported outputs. Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Add a third event recorder destination alongside file and webhook that
produces serialized events to a Kafka topic via franz-go.
Configuration is per-output under `event_recorder.outputs`:
event_recorder:
outputs:
- type: kafka
brokers: ["kafka-1:9093", "kafka-2:9093"]
topic: alertmanager.events
format: json # or "protobuf"
acks: leader # "none" | "leader" | "all"
compression: snappy # "none" | "gzip" | "snappy" | "lz4" | "zstd"
buffer_size: 1024
tls_config: { ... }
Implementation notes:
- KafkaOutput buffers events in a bounded local channel and forwards
them to franz-go's async producer. When the buffer is full, events
are dropped (counted via alertmanager_event_output_drops_total) so a
slow or unreachable broker cannot block the upstream pipeline.
- Broker unreachability at startup is logged at warn level and does
not prevent Alertmanager from starting; franz-go retries connections
in the background.
- Records use the producing instance's hostname as the message key,
keeping per-instance ordering on the same partition.
- A new optional ProtoDestination interface lets the Kafka output
receive protobuf events directly, skipping JSON serialization when
no JSON-mode destination is configured.
- JSON marshalling in marshalAndSend is now lazy: it only happens
when at least one non-proto destination needs it.
- TLS is supported via prometheus/common's TLSConfig (mTLS or
server-only). SASL is intentionally out of scope for this change
and can be added later via franz-go's kgo.SASL options.
- Idempotent writes are disabled unless acks=all is explicitly set,
to keep the default leader-ack path compatible with franz-go.
Metric changes:
- Rename alertmanager_event_webhook_drops_total ->
alertmanager_event_output_drops_total{output}, shared by webhook and
kafka outputs. This is a breaking metric rename; dashboards and
alerts referencing the old name need to be updated.
- Add alertmanager_event_kafka_produce_errors_total{output,error_type}
populated from franz-go's produce callback.
Testing:
- Unit tests use github.com/twmb/franz-go/pkg/kfake to spin an
in-process broker for JSON, protobuf, message-key, drop-on-full,
close-flush, initial-ping-failure, name-stability, and config
validation cases, plus a proto fast-path test against marshalAndSend.
Dependencies added:
github.com/twmb/franz-go
github.com/twmb/franz-go/pkg/kfake (test)
github.com/twmb/franz-go/plugin/kslog
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
590738d to
510008b
Compare
Add a third event recorder destination alongside file and webhook that
produces serialized events to a Kafka topic via franz-go.
Configuration is per-output under
event_recorder.outputs:Implementation notes:
KafkaOutputbuffers events in a bounded local channel and forwards them to franz-go's async producer. When the buffer is full, events are dropped (counted viaalertmanager_event_output_drops_total) so a slow or unreachable broker cannot block the upstream pipeline.Metric changes:
alertmanager_event_webhook_drops_total->alertmanager_event_output_drops_total{output}, shared by webhook and kafka outputs. This is a breaking metric rename; dashboards and alerts referencing the old name need to be updated.alertmanager_event_kafka_produce_errors_total{output,error_type}populated from franz-go's produce callback.Testing:
Dependencies added:
Pull Request Checklist
Please check all the applicable boxes.
benchstatto compare benchmarksWhich user-facing changes does this PR introduce?
Summary by CodeRabbit
New Features
Documentation