Skip to content

Added Applications Events for application-level metrics and custom label support in KafkaSink#79

Merged
LucaCanali merged 9 commits into
LucaCanali:masterfrom
aashishchauhan06:app_event_kafka_sink
Apr 24, 2026
Merged

Added Applications Events for application-level metrics and custom label support in KafkaSink#79
LucaCanali merged 9 commits into
LucaCanali:masterfrom
aashishchauhan06:app_event_kafka_sink

Conversation

@aashishchauhan06
Copy link
Copy Markdown
Contributor

@aashishchauhan06 aashishchauhan06 commented Feb 25, 2026

While the existing KafkaSink provides detailed stage, executor, and query metrics, there is currently no built-in support for:

  • Application-level aggregated metrics
  • Custom metadata labels via Spark configuration
  • applications_started and applications_ended events

This PR addresses these gaps while preserving backward compatibility with the existing KafkaSink. This PR adds:

  • Application-level metrics:
  • Executor counts
  • Stage counts
  • Task counts
  • Support for app lablels via Spark configuration: spark.sparkmeasure.appLabels.*
  • Enhanced applications_started and applications_ended events enriched with metadata

Happy to adjust naming, packaging, or documentation if needed.

@aashishchauhan06 aashishchauhan06 changed the title Add AppEventKafkaSink and AppEventKafkaSinkExtended for application-level metrics and custom metadata support Add AppEventKafkaSink and AppEventKafkaSinkExtended for application-level metrics and custom label support Feb 26, 2026
@aashishchauhan06 aashishchauhan06 changed the title Add AppEventKafkaSink and AppEventKafkaSinkExtended for application-level metrics and custom label support Added Applications Events for application-level metrics and custom label support in KafkaSink Feb 26, 2026
@LucaCanali
Copy link
Copy Markdown
Owner

Hi,
Thanks for the contribution, this is useful work. I think it should be introduced as a new sink version rather than changing KafkaSink behavior in place.

This PR adds substantial new semantics and payload fields (application lifecycle events, counters, labels, config payload), so it can break existing consumers that rely on the current Kafka message contract. Could you please move this into a new listener class such as KafkaSinkV2 ?

Also, before merge, I’d ask to fix configurations -> conf.getAll.toMap in applications_ended can leak secrets; please remove it or apply a strict allowlist/redaction strategy.

If you go with KafkaSinkV2, please also add documentation with:

  • schema docs for new event types/fields,
  • tests for emitted payload correctness (especially app start/end and counters),
  • clear compatibility note for existing users of KafkaSink.

Thank you,
Luca

@aashishchauhan06
Copy link
Copy Markdown
Contributor Author

Hi, Thanks for the contribution, this is useful work. I think it should be introduced as a new sink version rather than changing KafkaSink behavior in place.

This PR adds substantial new semantics and payload fields (application lifecycle events, counters, labels, config payload), so it can break existing consumers that rely on the current Kafka message contract. Could you please move this into a new listener class such as KafkaSinkV2 ?

Also, before merge, I’d ask to fix configurations -> conf.getAll.toMap in applications_ended can leak secrets; please remove it or apply a strict allowlist/redaction strategy.

If you go with KafkaSinkV2, please also add documentation with:

  • schema docs for new event types/fields,
  • tests for emitted payload correctness (especially app start/end and counters),
  • clear compatibility note for existing users of KafkaSink.

Thank you, Luca

Thanks for the detailed review and for the suggestions — really appreciate you taking the time to go through the changes. I’ll push an updated version of the PR once these changes are in place.

@pp-achauhan
Copy link
Copy Markdown
Contributor

Hi, Thanks for the contribution, this is useful work. I think it should be introduced as a new sink version rather than changing KafkaSink behavior in place.
This PR adds substantial new semantics and payload fields (application lifecycle events, counters, labels, config payload), so it can break existing consumers that rely on the current Kafka message contract. Could you please move this into a new listener class such as KafkaSinkV2 ?
Also, before merge, I’d ask to fix configurations -> conf.getAll.toMap in applications_ended can leak secrets; please remove it or apply a strict allowlist/redaction strategy.
If you go with KafkaSinkV2, please also add documentation with:

  • schema docs for new event types/fields,
  • tests for emitted payload correctness (especially app start/end and counters),
  • clear compatibility note for existing users of KafkaSink.

Thank you, Luca

Thanks for the detailed review and for the suggestions — really appreciate you taking the time to go through the changes. I’ll push an updated version of the PR once these changes are in place.

@LucaCanali Please review PR, i have modified PR to have backward compatibility.

@LucaCanali LucaCanali merged commit fb8cdeb into LucaCanali:master Apr 24, 2026
4 checks passed
@LucaCanali
Copy link
Copy Markdown
Owner

Thank you @pp-achauhan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants