-
Notifications
You must be signed in to change notification settings - Fork 240
[Alerting V2][Serverless & 9.5][M2] Adds docs for action policies, workflows, and notifications #6525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
nastasha-solomon
wants to merge
24
commits into
main
Choose a base branch
from
alerting/experimental-workflows
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[Alerting V2][Serverless & 9.5][M2] Adds docs for action policies, workflows, and notifications #6525
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
9bdc843
Add experimental alerting features workflows and notifications pages
nastasha-solomon 7cb32e9
Add alerting-v2 variables and replace phrase throughout
nastasha-solomon 5280091
fixes to naming
nastasha-solomon 2178647
[Alerting V2][Serverless & 9.5][M2] Dispatcher changes from May 2026 …
nastasha-solomon 23bad34
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon c4ce673
Update explore-analyze/alerting/kibana-alerting-experimental/notifica…
nastasha-solomon 887cba0
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon aa8613b
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon 9c2bf21
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon 6392ee1
Significant revisions
nastasha-solomon e3bda0f
fix toc issue
nastasha-solomon 62c4850
Update explore-analyze/alerting/kibana-alerting-experimental/action-p…
nastasha-solomon a5cb3ec
Editorial feedback
nastasha-solomon afd0560
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon 595db96
Update workflows-alerting.md
nastasha-solomon 54f5fd5
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon 8378613
move section up
nastasha-solomon 221897e
[Alerting V2][Serverless & 9.5][M2] Action policy and workflow change…
nastasha-solomon 0d4d278
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon 1509590
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon d8c233e
[Alerting V2][Serverless & 9.5][M2] Action policy and workflow change…
nastasha-solomon 1ca3bb9
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon 661f273
[Alerting V2][Serverless & 9.5][M2] Add action policies conceptual p…
nastasha-solomon bc7ab01
Merge branch 'main' into alerting/experimental-workflows
nastasha-solomon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
81 changes: 81 additions & 0 deletions
81
.../alerting/kibana-alerting-experimental/action-policies/about-action-policies.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| --- | ||
| navigation_title: About action policies | ||
| applies_to: | ||
| stack: experimental 9.5+ | ||
| serverless: experimental | ||
| products: | ||
| - id: kibana | ||
| description: "How action policies gate alert episodes through suppression, match conditions, and frequency before invoking workflows in the experimental alerting system." | ||
| --- | ||
|
|
||
| # About action policies [about-action-policies] | ||
|
|
||
| Action policies are part of the {{alerting-v2-system}} in {{kib}}. An action policy is the gating layer between an alert episode and a workflow. It decides whether and when to invoke a workflow by running the alert episode through a sequence of gates. A workflow runs only if the alert episode clears each gate in sequence. | ||
|
|
||
| ## Why policies are separate from rules [policies-separate-from-rules] | ||
|
|
||
| Policies are independent of rules. A single global policy can cover alert episodes from many rules, so a policy matching `severity: "critical"` applies regardless of which rule produced the alert episode. You can also update notification routing without touching any rule, and you can create rules without any action policy, which is useful for testing detection logic before wiring up notifications. | ||
|
|
||
| When you do need routing that is specific to one rule, create a per-rule policy and bind it to that rule at creation. | ||
|
|
||
| ## How alert episodes are gated [action-policy-gates] | ||
|
|
||
| The three gates are suppression, match conditions, and frequency: | ||
|
|
||
| * **Suppression** - Suppression checks whether the alert episode should be silenced. Episodes that are acknowledged, snoozed, or inside a maintenance window are stopped here and no workflow is invoked. For details on each mechanism and its scope, refer to [Reduce notification noise](reduce-notification-noise.md). | ||
| * **Match conditions** - Match conditions filter which alert episodes the policy applies to. You define them using [KQL](../../../query-filter/languages/kql.md). An empty match condition applies to all alert episodes within the policy's scope. | ||
| * **Frequency** - Frequency controls how often the policy can invoke its workflows for the same group of episodes, and how episodes batch before a workflow is invoked. Options are one notification per alert episode, one per notification group, or one digest for all matching episodes. If a workflow was already invoked within the cooldown period, the episode waits. | ||
|
|
||
| If any gate stops the episode, the workflow is not invoked for that policy. | ||
|
|
||
| :::{note} | ||
| Because each action policy evaluates alert episodes independently, an episode that is blocked by one policy can still trigger a workflow through a second policy with different conditions. | ||
| ::: | ||
|
|
||
| ## How policy types differ [policy-types] | ||
|
|
||
| Policies can be global or per-rule. Global policies apply across all rules in a space and suit most use cases. Per-rule policies apply to a single rule and give you precise control over routing for that rule without affecting anything else in the space. | ||
|
|
||
| ### Global policies | ||
|
|
||
| A global policy applies to all alert episodes in the space, from any rule. When an alert episode is produced, the dispatcher evaluates all enabled global policies that are not snoozed. Global is the default type and suits most use cases. | ||
|
|
||
| ### Per-rule policies | ||
|
|
||
| A per-rule policy is scoped to a single rule. It applies only to alert episodes produced by that specific rule. Use a per-rule policy when routing is specific to one rule and you do not want it to affect other rules in the space. The rule association is set at creation and cannot be changed. | ||
|
|
||
| ## How policies apply to rules [how-policies-apply] | ||
|
|
||
| How a policy applies depends on whether it is global or per-rule. Multiple policies can match the same alert episode, and each runs independently. There is no precedence or merging between them. If no policy matches an alert episode, no workflow is invoked and no notification is sent. | ||
|
|
||
| ### Global policies | ||
|
|
||
| Global policies don't reference rules directly. You scope them using KQL over alert episode and rule fields, for example `rule.tags: "checkout"` or `data.severity: "critical"`. A global policy applies to every matching alert episode in the space, from any rule. | ||
|
|
||
| ### Per-rule policies | ||
|
|
||
| Per-rule policies are bound to a specific rule at creation. They apply only to alert episodes from that rule, and you can still use match conditions to filter further within that rule's alert episodes. | ||
|
|
||
| ## How action policies are evaluated [how-action-policies-evaluated] | ||
|
|
||
| {{kib}} runs a background process called the dispatcher that checks for eligible alert episodes on a short interval (around 5 seconds) and evaluates action policies against them. The dispatcher runs on its own cycle, separate from the rule schedule. | ||
|
|
||
| For each enabled policy that is not snoozed, the dispatcher works through the following steps: | ||
|
|
||
| 1. **Gating:** Is the alert episode acknowledged, snoozed, or deactivated? If so, skip. Refer to [Reduce notification noise](reduce-notification-noise.md) to learn more. | ||
| 2. **Matcher:** Does the alert episode match the policy's KQL? If not, skip this policy. | ||
| 3. **Grouping:** How should matching alert episodes batch into notification groups? | ||
| 4. **Frequency:** Has a workflow already been invoked for this notification group recently? If so, wait. | ||
| 5. **Destinations:** Invoke the configured workflows. | ||
|
|
||
| Workflow invocations may not happen immediately after a rule evaluates. | ||
|
|
||
| :::{tip} | ||
| Severity changes can cause a policy to match an episode for the first time, which fires a notification even if the episode is not new. For example, if a policy is scoped to `severity: "critical"` and an episode escalates from `low` to `critical`, the policy fires because it has no prior notification record for that episode. However, a severity change alone does not re-trigger a policy that already matched the episode. Only a status change or the expiry of a time-based throttle can do that. For details and examples, refer to [Severity escalation and de-escalation](common-action-policy-scenarios.md#severity-escalation). | ||
| ::: | ||
|
|
||
| ## Next steps | ||
|
|
||
| - [Create and configure an action policy](create-configure-action-policy.md) to set up policy type, match conditions, grouping, frequency, and workflow destinations. | ||
| - [Manage action policies](manage-action-policies.md) to enable, disable, snooze, edit, or delete the policies in your space. | ||
|
Check notice on line 80 in explore-analyze/alerting/kibana-alerting-experimental/action-policies/about-action-policies.md
|
||
| - [Action policy reference](action-policy-reference.md) to look up available match condition fields, grouping modes, and frequency options. | ||
99 changes: 99 additions & 0 deletions
99
...lerting/kibana-alerting-experimental/action-policies/action-policy-reference.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| --- | ||
| navigation_title: Action policy reference | ||
| applies_to: | ||
| stack: experimental 9.5+ | ||
| serverless: experimental | ||
| products: | ||
| - id: kibana | ||
| description: "Grouping modes, frequency options, dispatch outcomes, and match conditions field reference for action policies in the experimental alerting system." | ||
| --- | ||
|
|
||
| # Action policy reference for the {{alerting-v2-system}} [action-policy-reference] | ||
|
|
||
| Action policies are part of the {{alerting-v2-system}} in {{kib}}. This page is a reference for match conditions fields, grouping modes, frequency options, and dispatch outcomes. For step-by-step guidance, refer to [Create and configure an action policy](create-configure-action-policy.md). | ||
|
|
||
| ## Match conditions fields [matcher-fields] | ||
|
|
||
| Use these fields in the **Match conditions** expression to filter which alert episodes a policy applies to. Combine them with standard [KQL](../../../query-filter/languages/kql.md) operators, for example `severity: "critical" AND episode_status: "active"`. | ||
|
|
||
| | Field | Type | Description | Accepted values | Example | | ||
| |---|---|---|---|---| | ||
| | `episode_id` | string | Unique identifier of the alert episode. | Any string | `episode_id: "ep-001"` | | ||
| | `episode_status` | string | Current lifecycle status of the alert episode. | `inactive`, `pending`, `active`, `recovering` | `episode_status: "active"` | | ||
| | `severity` | string | Current severity of the alert episode. Populated when the rule's ES\|QL query includes a `severity` column whose value matches a supported level (case-insensitive). Unrecognized values are silently ignored and the field is absent. Not set during recovery. Use to route high-severity episodes to dedicated workflows. | `info`, `low`, `medium`, `high`, `critical` | `severity: "critical" OR severity: "high"` | | ||
| | `group_hash` | string | Stable hash identifying the alert series the alert episode belongs to. | Any string | `group_hash: "abc123"` | | ||
| | `last_event_timestamp` | string | ISO 8601 timestamp of the most recent event recorded for the alert episode. | ISO 8601 timestamp | `last_event_timestamp > "2026-01-01"` | | ||
| | `rule.id` | string | Unique identifier of the rule that generated the alert episode. | Any string | `rule.id: "rule-001"` | | ||
| | `rule.name` | string | Display name of the rule. | Any string | `rule.name: "High CPU"` | | ||
| | `rule.tags` | string[] | Tags attached to the rule. Use to match alert episodes from rules with a specific tag. | Any string | `rule.tags: "payment-service"` | | ||
| | `data.*` | object | Dynamic payload fields sent by the rule. Available fields depend on the rule type and configuration. Use for rule-specific fields not covered by the standard episode fields above. | Depends on rule type | `data.host.name: "web-01"` | | ||
|
|
||
| <!--[CONTENT NEEDED: | ||
| When rule authoring docs are created (issue #6689), link from this table row to the rule authoring page that explains how to include a `severity` column in the ES|QL query. The full severity contract (column name, case-insensitivity, silent-ignore behavior) belongs in the rule authoring reference, not here.] | ||
| --> | ||
|
|
||
| ## Notify per options [notification-grouping] | ||
|
|
||
| Controls how the policy batches matching episodes before sending a notification. | ||
|
|
||
| | Option | Description | When to use | | ||
| |---|---|---| | ||
| | Episode | The policy sends one notification per alert episode, independently of other episodes. Default selection. | You need per-issue visibility and want to handle each problem separately. | | ||
| | Group | The policy bundles alert episodes that share the same value for a specified `data.*` field into one notification per unique value. Each unique value forms a **notification group**. | A rule produces many related alert episodes, such as one per service or host, and you want to reduce noise by batching them into shared notifications. | | ||
| | Digest | The policy combines all matching alert episodes into a single notification, regardless of what they have in common. | You want a single periodic summary of everything that matched, rather than individual alert episodes. | | ||
|
|
||
| ## Frequency [throttle-strategies] | ||
|
|
||
| Frequency controls how often the policy fires for a given alert episode or notification group. The available options depend on the **Notify per** setting. Not all options are valid for all modes. | ||
|
|
||
| | Option | Description | When to use | | ||
| |---|---|---| | ||
| | On status change | Notifies when the alert episode status changes, for example from active to recovering. One notification per transition. | You only need to know when something breaks and when it's resolved. Use this when you trust your ticketing or incident workflow to track ongoing issues. | | ||
| | On status change + repeat at interval | Notifies on status change, then resends notifications at a regular interval while the alert episode remains in the same status. | You want status change notifications plus periodic reminders that a problem is still unresolved, in case it has been missed or pushed aside. | | ||
| | At most once every… | Caps notifications at one per alert episode or notification group within the chosen interval, regardless of rule frequency. | You want to limit notification volume for noisy rules without missing new or ongoing issues. | | ||
| | Every evaluation | Notifies on every rule evaluation. Can be noisy. Use sparingly and only with infrequent rule schedules. | You need a full audit trail of every evaluation, or the rule runs infrequently enough that noise isn't a concern. | | ||
|
|
||
| ### Frequency options for Episode [frequency-when-episode-per_episode] | ||
|
|
||
| Available frequency options when you set **Notify per** to **Episode**. | ||
|
|
||
| | Option | Description | Example | | ||
| |---|---|---| | ||
| | On status change | Notifies once when the alert episode opens and once when it recovers. No repeat notifications while it remains active. | A host goes down at 9:00am → one notification. Recovers at 11:00am → one notification. No notifications between them. | | ||
| | On status change + repeat at interval | Same as On status change, but also sends a reminder at a set interval while the alert episode is still active. | A host goes down at 9:00am → notification. With a 1h repeat: reminder at 10:00am, 11:00am. Recovers at 11:30am → notification. | | ||
| | Every evaluation | Fires on every rule evaluation, regardless of status. Can be noisy on frequent rule schedules. Avoid in production. | A rule running every 5 minutes with one active alert episode produces up to 288 notifications per day. | | ||
|
|
||
| ### Frequency options for Group | ||
|
|
||
| Available frequency options when you set **Notify per** to **Group**. | ||
|
|
||
| | Option | Description | Example | | ||
| |---|---|---| | ||
| | At most once every… | Limits how often each notification group can notify, regardless of how many alert episodes match or how often the rule runs. | 10 alert episodes share `data.host.name: "web-01"`. With a 1h limit, you get at most one notification per hour for that notification group. | | ||
| | Every evaluation | Fires on every rule evaluation for each unique value in the group-by field. Still noisy on frequent rule schedules. | A rule running every 10 minutes with 5 unique host values produces up to 6 notifications per host per hour. | | ||
|
|
||
| ### Frequency options for Digest | ||
|
|
||
| Available frequency options when you set **Notify per** to **Digest**. | ||
|
|
||
| | Option | Description | Example | | ||
| |---|---|---| | ||
| | At most once every… (default) | Caps digest delivery to at most one bundled summary within the chosen interval, regardless of how often the rule runs. | A rule running every 5 minutes with a 1h digest interval sends one bundled summary per hour containing all matching alert episodes from that period. | | ||
| | Every evaluation | Fires on every rule run, bundling all matching alert episodes into one message. Can be noisy on frequent rule schedules. | A rule running every 30 minutes with 20 matching alert episodes produces one summary every 30 minutes containing all 20. | | ||
|
|
||
| ## Dispatch outcomes | ||
|
|
||
| The dispatcher records each run with one of the following outcomes. To investigate delivery issues, open Discover, query the `.alert-actions` index, and filter by the `action_type` field. | ||
|
|
||
| | Outcome | What happened | | ||
| |---|---| | ||
| | `dispatched` | The dispatcher invoked a workflow for the alert episode. | | ||
| | `throttled` | The alert episode matched a policy but was rate-limited by the frequency setting. No workflow ran. This is expected behavior, not an error. | | ||
| | `suppressed` | Dispatch was blocked. The alert episode was acknowledged, snoozed, or deactivated, or the space is currently in a [maintenance window](../../alerts/maintenance-windows.md). | | ||
| | `unmatched` | No action policy matched the alert episode. No workflow ran. | | ||
|
|
||
| ## Related pages | ||
|
|
||
| - [Create and configure an action policy](create-configure-action-policy.md) to apply these fields and options when setting up a policy. | ||
| - [Manage action policies in {{alerting-v2-system}}](manage-action-policies.md) to enable, disable, snooze, or audit your policies. | ||
|
Check notice on line 98 in explore-analyze/alerting/kibana-alerting-experimental/action-policies/action-policy-reference.md
|
||
| - [About action policies](about-action-policies.md) to understand how action policies evaluate and gate alert episodes. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.