Add team_name tag to Celery executor metrics#69092
Open
SameerMesiah97 wants to merge 1 commit into
Open
Conversation
5d00ffe to
bcf37a6
Compare
metric to improve per-team observability in multi-team deployments. Add unit test coverage for the task timeout retry path, verifying that the metric is emitted with and without the team_name tag.
bcf37a6 to
36e5e52
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This change adds the
team_nametag to thecelery.task_timeout_errormetric emitted by the Celery executor when retrying workload publication after anAirflowTaskTimeout.The
celery.execute_command.failuremetric was intentionally left unchanged. That metric is only emitted byexecute_command(), which is used on Airflow versions prior to 3.0, whereas multi-team support (andteam_name) was introduced in Airflow 3.1. As a result, there is no supported execution path wherecelery.execute_command.failurecan be associated with a team.Rationale
The
celery.task_timeout_errormetric is used to identify workload publication timeouts. Adding theteam_nametag allows these events to be attributed to the owning team in multi-team deployments, making it easier to diagnose publish failures and identify affected teams.Tests
Added a unit test covering the previously untested workload publication timeout retry path. The test verifies that
celery.task_timeout_erroris emitted with the expected tags both with and withoutteam_name.Backwards Compatibility
This change is additive only. The
team_nametag is only emitted when available; otherwise, the metric is emitted unchanged.Related: #68996
Was generative AI tooling used to co-author this PR?
Generated-by: [GPT 5.5] following the guidelines