diff --git a/docset.yml b/docset.yml index 51b1a89f12..480c9b0d28 100644 --- a/docset.yml +++ b/docset.yml @@ -124,8 +124,8 @@ subs: ls-pipelines-app: "Logstash Pipelines" maint-windows-app: "Maintenance Windows" maint-windows-cap: "Maintenance windows" - alerting-v2: "experimental alerting features" - alerting-v2-cap: "Experimental alerting features" + alerting-v2-system: "experimental alerting system" + alerting-v2-system-cap: "Experimental alerting system" custom-roles-app: "Custom Roles" data-source: "data view" data-sources: "data views" diff --git a/explore-analyze/alerting/kibana-alerting-experimental/action-policies/about-action-policies.md b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/about-action-policies.md new file mode 100644 index 0000000000..5c2ff530cb --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/about-action-policies.md @@ -0,0 +1,81 @@ +--- +navigation_title: About action policies +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "How action policies gate alert episodes through suppression, match conditions, and frequency before invoking workflows in the experimental alerting system." +--- + +# About action policies [about-action-policies] + +Action policies are part of the {{alerting-v2-system}} in {{kib}}. An action policy is the gating layer between an alert episode and a workflow. It decides whether and when to invoke a workflow by running the alert episode through a sequence of gates. A workflow runs only if the alert episode clears each gate in sequence. + +## Why policies are separate from rules [policies-separate-from-rules] + +Policies are independent of rules. A single global policy can cover alert episodes from many rules, so a policy matching `severity: "critical"` applies regardless of which rule produced the alert episode. You can also update notification routing without touching any rule, and you can create rules without any action policy, which is useful for testing detection logic before wiring up notifications. + +When you do need routing that is specific to one rule, create a per-rule policy and bind it to that rule at creation. + +## How alert episodes are gated [action-policy-gates] + +The three gates are suppression, match conditions, and frequency: + +* **Suppression** - Suppression checks whether the alert episode should be silenced. Episodes that are acknowledged, snoozed, or inside a maintenance window are stopped here and no workflow is invoked. For details on each mechanism and its scope, refer to [Reduce notification noise](reduce-notification-noise.md). +* **Match conditions** - Match conditions filter which alert episodes the policy applies to. You define them using [KQL](../../../query-filter/languages/kql.md). An empty match condition applies to all alert episodes within the policy's scope. +* **Frequency** - Frequency controls how often the policy can invoke its workflows for the same group of episodes, and how episodes batch before a workflow is invoked. Options are one notification per alert episode, one per notification group, or one digest for all matching episodes. If a workflow was already invoked within the cooldown period, the episode waits. + +If any gate stops the episode, the workflow is not invoked for that policy. + +:::{note} +Because each action policy evaluates alert episodes independently, an episode that is blocked by one policy can still trigger a workflow through a second policy with different conditions. +::: + +## How policy types differ [policy-types] + +Policies can be global or per-rule. Global policies apply across all rules in a space and suit most use cases. Per-rule policies apply to a single rule and give you precise control over routing for that rule without affecting anything else in the space. + +### Global policies + +A global policy applies to all alert episodes in the space, from any rule. When an alert episode is produced, the dispatcher evaluates all enabled global policies that are not snoozed. Global is the default type and suits most use cases. + +### Per-rule policies + +A per-rule policy is scoped to a single rule. It applies only to alert episodes produced by that specific rule. Use a per-rule policy when routing is specific to one rule and you do not want it to affect other rules in the space. The rule association is set at creation and cannot be changed. + +## How policies apply to rules [how-policies-apply] + +How a policy applies depends on whether it is global or per-rule. Multiple policies can match the same alert episode, and each runs independently. There is no precedence or merging between them. If no policy matches an alert episode, no workflow is invoked and no notification is sent. + +### Global policies + +Global policies don't reference rules directly. You scope them using KQL over alert episode and rule fields, for example `rule.tags: "checkout"` or `data.severity: "critical"`. A global policy applies to every matching alert episode in the space, from any rule. + +### Per-rule policies + +Per-rule policies are bound to a specific rule at creation. They apply only to alert episodes from that rule, and you can still use match conditions to filter further within that rule's alert episodes. + +## How action policies are evaluated [how-action-policies-evaluated] + +{{kib}} runs a background process called the dispatcher that checks for eligible alert episodes on a short interval (around 5 seconds) and evaluates action policies against them. The dispatcher runs on its own cycle, separate from the rule schedule. + +For each enabled policy that is not snoozed, the dispatcher works through the following steps: + +1. **Gating:** Is the alert episode acknowledged, snoozed, or deactivated? If so, skip. Refer to [Reduce notification noise](reduce-notification-noise.md) to learn more. +2. **Matcher:** Does the alert episode match the policy's KQL? If not, skip this policy. +3. **Grouping:** How should matching alert episodes batch into notification groups? +4. **Frequency:** Has a workflow already been invoked for this notification group recently? If so, wait. +5. **Destinations:** Invoke the configured workflows. + +Workflow invocations may not happen immediately after a rule evaluates. + +:::{tip} +Severity changes can cause a policy to match an episode for the first time, which fires a notification even if the episode is not new. For example, if a policy is scoped to `severity: "critical"` and an episode escalates from `low` to `critical`, the policy fires because it has no prior notification record for that episode. However, a severity change alone does not re-trigger a policy that already matched the episode. Only a status change or the expiry of a time-based throttle can do that. For details and examples, refer to [Severity escalation and de-escalation](common-action-policy-scenarios.md#severity-escalation). +::: + +## Next steps + +- [Create and configure an action policy](create-configure-action-policy.md) to set up policy type, match conditions, grouping, frequency, and workflow destinations. +- [Manage action policies](manage-action-policies.md) to enable, disable, snooze, edit, or delete the policies in your space. +- [Action policy reference](action-policy-reference.md) to look up available match condition fields, grouping modes, and frequency options. diff --git a/explore-analyze/alerting/kibana-alerting-experimental/action-policies/action-policy-reference.md b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/action-policy-reference.md new file mode 100644 index 0000000000..7f9141c6fb --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/action-policy-reference.md @@ -0,0 +1,99 @@ +--- +navigation_title: Action policy reference +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "Grouping modes, frequency options, dispatch outcomes, and match conditions field reference for action policies in the experimental alerting system." +--- + +# Action policy reference for the {{alerting-v2-system}} [action-policy-reference] + +Action policies are part of the {{alerting-v2-system}} in {{kib}}. This page is a reference for match conditions fields, grouping modes, frequency options, and dispatch outcomes. For step-by-step guidance, refer to [Create and configure an action policy](create-configure-action-policy.md). + +## Match conditions fields [matcher-fields] + +Use these fields in the **Match conditions** expression to filter which alert episodes a policy applies to. Combine them with standard [KQL](../../../query-filter/languages/kql.md) operators, for example `severity: "critical" AND episode_status: "active"`. + +| Field | Type | Description | Accepted values | Example | +|---|---|---|---|---| +| `episode_id` | string | Unique identifier of the alert episode. | Any string | `episode_id: "ep-001"` | +| `episode_status` | string | Current lifecycle status of the alert episode. | `inactive`, `pending`, `active`, `recovering` | `episode_status: "active"` | +| `severity` | string | Current severity of the alert episode. Populated when the rule's ES\|QL query includes a `severity` column whose value matches a supported level (case-insensitive). Unrecognized values are silently ignored and the field is absent. Not set during recovery. Use to route high-severity episodes to dedicated workflows. | `info`, `low`, `medium`, `high`, `critical` | `severity: "critical" OR severity: "high"` | +| `group_hash` | string | Stable hash identifying the alert series the alert episode belongs to. | Any string | `group_hash: "abc123"` | +| `last_event_timestamp` | string | ISO 8601 timestamp of the most recent event recorded for the alert episode. | ISO 8601 timestamp | `last_event_timestamp > "2026-01-01"` | +| `rule.id` | string | Unique identifier of the rule that generated the alert episode. | Any string | `rule.id: "rule-001"` | +| `rule.name` | string | Display name of the rule. | Any string | `rule.name: "High CPU"` | +| `rule.tags` | string[] | Tags attached to the rule. Use to match alert episodes from rules with a specific tag. | Any string | `rule.tags: "payment-service"` | +| `data.*` | object | Dynamic payload fields sent by the rule. Available fields depend on the rule type and configuration. Use for rule-specific fields not covered by the standard episode fields above. | Depends on rule type | `data.host.name: "web-01"` | + + + +## Notify per options [notification-grouping] + +Controls how the policy batches matching episodes before sending a notification. + +| Option | Description | When to use | +|---|---|---| +| Episode | The policy sends one notification per alert episode, independently of other episodes. Default selection. | You need per-issue visibility and want to handle each problem separately. | +| Group | The policy bundles alert episodes that share the same value for a specified `data.*` field into one notification per unique value. Each unique value forms a **notification group**. | A rule produces many related alert episodes, such as one per service or host, and you want to reduce noise by batching them into shared notifications. | +| Digest | The policy combines all matching alert episodes into a single notification, regardless of what they have in common. | You want a single periodic summary of everything that matched, rather than individual alert episodes. | + +## Frequency [throttle-strategies] + +Frequency controls how often the policy fires for a given alert episode or notification group. The available options depend on the **Notify per** setting. Not all options are valid for all modes. + +| Option | Description | When to use | +|---|---|---| +| On status change | Notifies when the alert episode status changes, for example from active to recovering. One notification per transition. | You only need to know when something breaks and when it's resolved. Use this when you trust your ticketing or incident workflow to track ongoing issues. | +| On status change + repeat at interval | Notifies on status change, then resends notifications at a regular interval while the alert episode remains in the same status. | You want status change notifications plus periodic reminders that a problem is still unresolved, in case it has been missed or pushed aside. | +| At most once every… | Caps notifications at one per alert episode or notification group within the chosen interval, regardless of rule frequency. | You want to limit notification volume for noisy rules without missing new or ongoing issues. | +| Every evaluation | Notifies on every rule evaluation. Can be noisy. Use sparingly and only with infrequent rule schedules. | You need a full audit trail of every evaluation, or the rule runs infrequently enough that noise isn't a concern. | + +### Frequency options for Episode [frequency-when-episode-per_episode] + +Available frequency options when you set **Notify per** to **Episode**. + +| Option | Description | Example | +|---|---|---| +| On status change | Notifies once when the alert episode opens and once when it recovers. No repeat notifications while it remains active. | A host goes down at 9:00am → one notification. Recovers at 11:00am → one notification. No notifications between them. | +| On status change + repeat at interval | Same as On status change, but also sends a reminder at a set interval while the alert episode is still active. | A host goes down at 9:00am → notification. With a 1h repeat: reminder at 10:00am, 11:00am. Recovers at 11:30am → notification. | +| Every evaluation | Fires on every rule evaluation, regardless of status. Can be noisy on frequent rule schedules. Avoid in production. | A rule running every 5 minutes with one active alert episode produces up to 288 notifications per day. | + +### Frequency options for Group + +Available frequency options when you set **Notify per** to **Group**. + +| Option | Description | Example | +|---|---|---| +| At most once every… | Limits how often each notification group can notify, regardless of how many alert episodes match or how often the rule runs. | 10 alert episodes share `data.host.name: "web-01"`. With a 1h limit, you get at most one notification per hour for that notification group. | +| Every evaluation | Fires on every rule evaluation for each unique value in the group-by field. Still noisy on frequent rule schedules. | A rule running every 10 minutes with 5 unique host values produces up to 6 notifications per host per hour. | + +### Frequency options for Digest + +Available frequency options when you set **Notify per** to **Digest**. + +| Option | Description | Example | +|---|---|---| +| At most once every… (default) | Caps digest delivery to at most one bundled summary within the chosen interval, regardless of how often the rule runs. | A rule running every 5 minutes with a 1h digest interval sends one bundled summary per hour containing all matching alert episodes from that period. | +| Every evaluation | Fires on every rule run, bundling all matching alert episodes into one message. Can be noisy on frequent rule schedules. | A rule running every 30 minutes with 20 matching alert episodes produces one summary every 30 minutes containing all 20. | + +## Dispatch outcomes + +The dispatcher records each run with one of the following outcomes. To investigate delivery issues, open Discover, query the `.alert-actions` index, and filter by the `action_type` field. + +| Outcome | What happened | +|---|---| +| `dispatched` | The dispatcher invoked a workflow for the alert episode. | +| `throttled` | The alert episode matched a policy but was rate-limited by the frequency setting. No workflow ran. This is expected behavior, not an error. | +| `suppressed` | Dispatch was blocked. The alert episode was acknowledged, snoozed, or deactivated, or the space is currently in a [maintenance window](../../alerts/maintenance-windows.md). | +| `unmatched` | No action policy matched the alert episode. No workflow ran. | + +## Related pages + +- [Create and configure an action policy](create-configure-action-policy.md) to apply these fields and options when setting up a policy. +- [Manage action policies in {{alerting-v2-system}}](manage-action-policies.md) to enable, disable, snooze, or audit your policies. +- [About action policies](about-action-policies.md) to understand how action policies evaluate and gate alert episodes. \ No newline at end of file diff --git a/explore-analyze/alerting/kibana-alerting-experimental/action-policies/common-action-policy-scenarios.md b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/common-action-policy-scenarios.md new file mode 100644 index 0000000000..f18a724986 --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/common-action-policy-scenarios.md @@ -0,0 +1,104 @@ +--- +navigation_title: Common scenarios +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "Common action policy scenarios for the experimental alerting system, including routing by severity, handling severity escalation, and controlling re-notification." +--- + +# Common action policy scenarios [common-action-policy-scenarios] + +Action policies are part of the {{alerting-v2-system}} in {{kib}}. This page covers common situations you're likely to encounter when setting up action policies, and explains how to configure them to get the behavior you expect. + +## Route alert episodes to different workflows by severity [routing-by-severity] + +When your rules produce alert episodes at different severity levels, you'll often want to route them to different workflows. For example, you might page an on-call team for critical episodes while sending lower-severity episodes to a Slack channel for async review. + +To do this, create separate policies scoped to specific severity values using match conditions: + +| Field | Policy A | Policy B | +|---|---|---| +| **Policy type** | Global | Global | +| **Match conditions** | `severity: "critical"` | `severity: "low" OR severity: "medium" OR severity: "high"` | +| **Notify per** | Episode | Episode | +| **Frequency** | On status change | On status change | +| **Destinations** | PagerDuty workflow | Slack workflow | + +Each policy evaluates alert episodes independently: + +- An episode with `severity: "critical"` matches Policy A but not Policy B. +- An episode with `severity: "high"` matches Policy B but not Policy A. +- If an episode's severity changes mid-lifecycle, the policies that match it change accordingly. For example, if an episode escalates from `high` to `critical`, Policy A starts matching and Policy B stops matching. Policy A fires because it has no prior notification record for that episode. + +## Manage notifications across severity changes [severity-escalation] + +To manage notifications effectively when episode severity changes, you need to understand how policies match and re-match episodes as severity shifts. Whether a severity change fires a notification depends on whether the policy has matched the episode before and what frequency option is set. + +### Notify when an episode escalates into a new severity threshold + +Scope a policy to the severity level you want to be notified about. When an episode escalates into that severity level for the first time, the policy fires because it has no prior notification record for the episode. + +**Example:** Policy B is scoped to `severity: "critical"`. An episode starts at `low` severity, so Policy B does not match. When the episode escalates to `critical`, Policy B now matches and fires (regardless of the frequency setting) because it has never notified for this episode before. + +| Field | Value | +|---|---| +| **Policy type** | Global | +| **Match conditions** | `severity: "critical"` | +| **Notify per** | Episode | +| **Frequency** | On status change | +| **Destinations** | PagerDuty workflow | + +### Prevent duplicate notifications when severity changes within an existing match + +If a policy already matched an episode at a lower severity and the episode escalates, the policy does not automatically re-notify. With `On status change` frequency, a severity change alone does not count as a status change. + +**Example:** Policy A matches all episodes regardless of severity. It notified when the episode was `low`. The episode escalates to `critical`, but Policy A still matches and the status has not changed, only the severity has. The throttle blocks re-notification. To re-notify on escalation, use a time-based throttle or create separate policies per severity level as described in [Route alert episodes to different workflows by severity](#routing-by-severity). + +| Field | Value | +|---|---| +| **Policy type** | Global | +| **Match conditions** | (None, matches all episodes) | +| **Notify per** | Episode | +| **Frequency** | On status change | +| **Destinations** | Slack workflow | + +### Stop notifications when an episode de-escalates below a policy's threshold + +If an episode drops below a policy's severity threshold, the policy stops matching and sends no further notifications. If the episode later escalates back above the threshold, the policy fires again as if it were the first match. + +**Example:** Policy B is scoped to `severity: "critical"`. An episode de-escalates from `critical` to `high`. Policy B no longer matches and stops sending notifications. If the episode later escalates back to `critical`, Policy B fires again. + +| Field | Value | +|---|---| +| **Policy type** | Global | +| **Match conditions** | `severity: "critical"` | +| **Notify per** | Episode | +| **Frequency** | On status change | +| **Destinations** | PagerDuty workflow | + +## Re-notify for persistently active episodes [controlling-re-notification] + +The `On status change` frequency option notifies once per status transition (for example, when an episode activates or resolves). This is efficient for reducing noise, but it means that a persistently active episode that only changes in severity won't re-trigger a notification. + +If you want re-notification for episodes that stay active without a status change, use a time-based throttle instead: + +- **`At most once every…`:** Re-notifies after the configured interval regardless of whether severity or status changed. For example, `1h` sends a follow-up notification every hour while the episode remains active and matched. +- **`On status change + repeat at interval`:** Notifies on status change and then repeats at the configured interval while the episode stays in the same status. + +**Example:** You want to be re-paged if a critical episode stays open for more than an hour. Set the policy frequency to `At most once every 1h`. The policy fires when the episode first matches and then again each hour until the episode resolves or no longer matches. + +| Field | Value | +|---|---| +| **Policy type** | Global | +| **Match conditions** | `severity: "critical"` | +| **Notify per** | Episode | +| **Frequency** | At most once every 1 hour | +| **Destinations** | PagerDuty workflow | + +## Related pages + +- [Action policy reference](action-policy-reference.md) to find descriptions of match condition fields, grouping modes, and frequency options. +- [About action policies](about-action-policies.md) to understand how action policies evaluate and gate alert episodes. +- [Create and configure an action policy](create-configure-action-policy.md) to learn how to set up policy type, match conditions, grouping, frequency, and workflow destinations. diff --git a/explore-analyze/alerting/kibana-alerting-experimental/action-policies/create-configure-action-policy.md b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/create-configure-action-policy.md new file mode 100644 index 0000000000..4220566194 --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/create-configure-action-policy.md @@ -0,0 +1,81 @@ +--- +navigation_title: Create an action policy +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "Create action policies in the experimental alerting system, configure match conditions, Notify per, Frequency, and workflow destinations." +--- + +# Create an action policy for the {{alerting-v2-system}} [create-manage-action-policies] + + +Action policies are part of the {{alerting-v2-system}} in {{kib}}. This page covers how to configure policy type, match conditions, grouping, frequency, and workflow destinations. Where rules define what counts as a problem, action policies define what happens when one is detected: which alert episodes generate notifications, how they batch for dispatch, and where they're routed. + +Because policies are separate from rules, you can update notification behavior across many rules at once without touching detection logic, and you can route the same alert episodes differently depending on severity or source. You create and manage policies from the **Action policies** page, not from the rule form. + +For match conditions fields, grouping modes, frequency options, and dispatch outcomes, refer to [Action policy reference](action-policy-reference.md). + +## Policy type [policy-type] + +Action policies only process alert episodes from rules running in Alert mode. Signals produced by rules running in Detect mode are not eligible for action policy evaluation. + +An action policy can be global or per-rule: + +- **Global** - Global policies apply to any alert episode in the space. Use a global policy when you want to route alert episodes from multiple rules. For example, a policy matching `rule.tags: "checkout"` applies to every rule with that tag. +- **Per-rule** - Per-rule policies are scoped to a single rule. Use a per-rule policy when notification routing is specific to one rule and you don't want it to affect other rules in the space. + +The policy type is set at creation and cannot be changed. If you need a different type, create a new policy. + +## Tags [policy-tags] + +Optional string labels you assign to a policy to categorize it or filter it on the Action policies list. Unlike rule tags, policy tags describe the policy itself rather than the alerts it matches. You can add, edit, or remove tags at any time without affecting routing behavior. + +## Match conditions [matcher] + + +An optional [KQL](../../../query-filter/languages/kql.md) expression that filters which alert episodes this policy applies to. An empty match condition matches every alert episode covered by the policy's scope. For a global policy, that means all alert episodes in the space. For a per-rule policy, it means all alert episodes from the associated rule. + +The match condition is the sole mechanism for scoping a policy beyond its base type. There are no separate rule type or rule ID selector fields. All scoping is done through this expression. For a global policy that should target a specific rule, use `rule.id: "my-rule-id"` or `rule.tags: "my-tag"` in the match condition. + +Use match conditions to route different alert episodes to different policies, for example, one policy for `severity: "critical"` alert episodes routed to PagerDuty and another for lower-severity episodes routed to Slack. You can also scope by rule, such as `rule.tags: "payment-service"`, to apply a policy only to alert episodes from a set of related rules. For available fields and examples, refer to [Match conditions fields](action-policy-reference.md#matcher-fields). + +## Grouping and frequency [reduce-noise-grouping] + + +**Notify per** controls how alert episodes batch into notifications. **Frequency** controls how often the policy can notify for each batch. + +:::{table} +:widths: 4-4-4 + +| Notify per | What it does | Available Frequency options | +|---|---|---| +| Episode | One notification per alert episode. | - On status change
- On status change + repeat at interval
- Every evaluation | +| Group | Bundle alert episodes that share a field value. Specify a **Group by** field such as `data.service.name` or `data.host.name`. | - At most once every…
- Every evaluation | +| Digest | One notification for all matching alert episodes combined. | - At most once every… (default)
- Every evaluation | + +::: + +For detailed descriptions, frequency options, and examples for each mode, refer to [Notify per options](action-policy-reference.md#notification-grouping). + +## Frequency [throttle] + + +**Frequency** limits how often the policy can fire for a given notification group. The interval resets from the last time the policy fired, so successive notifications stay at least `interval` apart. Set a duration such as `1h` or `30m`. For available options by **Notify per** mode, refer to [Frequency](action-policy-reference.md#throttle-strategies). + +:::{note} +`On status change` only re-notifies when the alert episode's status changes, not when its severity changes. If an episode escalates from `low` to `critical` but the policy already matched it and the status hasn't changed, the throttle blocks re-notification. + +To receive escalation notifications, either create separate policies scoped to specific severity levels, or use a time-based throttle such as `At most once every 1h` so the policy re-notifies after the interval regardless of severity or status changes. For examples, refer to [Controlling re-notification](common-action-policy-scenarios.md#controlling-re-notification). +::: + +## Destinations + +One or more [workflows](../../../workflows.md) to invoke when the policy matches. Use the search field to find and attach workflows. + +## Related pages + +- [Manage action policies in {{alerting-v2-system}}](manage-action-policies.md) to view, enable, disable, or snooze the policies you create. +- [Action policy reference in {{alerting-v2-system}}](action-policy-reference.md) to look up match condition fields, grouping modes, and frequency options. +- [About action policies](about-action-policies.md) to understand how action policies evaluate and gate alert episodes. diff --git a/explore-analyze/alerting/kibana-alerting-experimental/action-policies/manage-action-policies.md b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/manage-action-policies.md new file mode 100644 index 0000000000..edd7560ebe --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/manage-action-policies.md @@ -0,0 +1,65 @@ +--- +navigation_title: Manage action policies +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "View policy details, enable, disable, snooze, review execution history, and rotate API keys for action policies in the experimental alerting system." +--- + +# Manage action policies for the {{alerting-v2-system}} + +Action policies are part of the {{alerting-v2-system}} in {{kib}}. This page covers how to view policy details, enable and disable policies, snooze them during planned outages, rotate their API keys, and review execution history. + +## View policy details + +From the **Action policies** list, you can open a policy to see its full configuration, including match conditions, grouping mode, frequency, and destinations. The list also shows the display name of the user who created the policy and the user who last updated it. You can also edit, clone, delete, enable, disable, snooze, or update its API key without leaving the list page. + +## Execution history + +After each dispatcher run, {{kib}} records the outcome in the `.alert-actions` index. These records let you audit whether workflows were invoked, skipped, or had no matching policy for each alert episode. + +| Outcome | What it means | +|---|---| +| `dispatched` | The dispatcher invoked a workflow for the alert episode. | +| `throttled` | The alert episode matched a policy but was rate-limited by the frequency setting. No workflow ran. | +| `suppressed` | Dispatch was blocked. The alert episode was acknowledged, snoozed, or deactivated, or the space is currently in a [maintenance window](../../alerts/maintenance-windows.md). | +| `unmatched` | No action policy matched the alert episode. No workflow ran. | + +The **Execution history** view lets you search these records by policy name, rule name, or saved-object ID, and filter by outcome. + +To query raw dispatch records directly, open Discover and query the `.alert-actions` index. Filter by `action_type` to narrow by outcome, or by `policy_id` to filter by policy. + +## Enable, disable, and snooze + +You can disable a policy so it is not evaluated for new alert episodes. You can snooze a policy for a defined window so that it does not dispatch notifications during that period. Policies that are not enabled or are snoozed are skipped when the dispatcher evaluates policies. + +:::{note} +Snoozing a policy differs from [snoozing an alert episode](reduce-notification-noise.md#snooze-scope). When you snooze a policy, the dispatch mechanism is paused and every series the policy would process is silenced. When you snooze an alert episode, you target one specific series before policy matching runs, silencing it regardless of which policy would have handled it. Use alert snooze when you want to quiet a specific recurring alert without affecting other series handled by the same policy. +::: + +### Maintenance windows [maintenance-windows] + +During a [maintenance window](../../alerts/maintenance-windows.md), action policies stop dispatching notifications automatically. No policy configuration is required. Rule evaluation continues and alert episodes are still recorded in `.rule-events`. Maintenance windows are configured separately, not on the action policy. + +## Update API keys + +You can rotate the API key used to run a policy's workflows without changing matchers or destinations. Use the **Update API key** action on one policy or for multiple selected policies. + +::::{important} + +**Production considerations** + +When you update or delete an action policy, previous API keys used for execution are marked for invalidation and removed on a schedule managed by {{kib}}. Allow for a short delay before new keys are used for dispatch. +:::: + +## Bulk actions + +On the action policies list, select one or more policies to enable, disable, snooze, and do more in bulk. **Select all** selects every policy on the current page of results. Clear the selection before changing filters if you need a different set. + +## Related pages + +- [Reduce notification noise in {{alerting-v2-system}}](reduce-notification-noise.md) to silence individual alert episodes using acknowledge, snooze, or deactivate. +- [Action policy reference in {{alerting-v2-system}}](action-policy-reference.md) to look up match condition fields, grouping modes, and frequency options. +- [Create and configure an action policy](create-configure-action-policy.md) to set up or update the policies you manage here. diff --git a/explore-analyze/alerting/kibana-alerting-experimental/action-policies/reduce-notification-noise.md b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/reduce-notification-noise.md new file mode 100644 index 0000000000..c68478bd1e --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/action-policies/reduce-notification-noise.md @@ -0,0 +1,49 @@ +--- +navigation_title: Reduce notification noise +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "How to reduce notification noise in the experimental alerting system using acknowledge, snooze, and deactivate to silence alert episodes." +--- + +# Reduce notification noise for the {{alerting-v2-system}} [reduce-notification-noise] + +Acknowledge, snooze, and deactivate are part of the {{alerting-v2-system}} in {{kib}}. Each one silences notifications for an alert episode at a different scope. When an alert episode is silenced, the dispatcher stops processing it before any action policy matching, grouping, or frequency evaluation runs. + +This page covers when to use each silencing mechanism and how the scope of an alert episode snooze differs from the scope of a policy snooze. For an overview of where this fits in the full dispatch cycle, refer to [About action policies](about-action-policies.md). + +## Silencing mechanisms [silencing-mechanisms] + +Three mechanisms let you silence notifications, each at a different scope: + +| Mechanism | Scope | When to use | +|---|---|---| +| Acknowledge | Per alert episode | You're actively investigating a breach and want to silence notifications for it without closing the alert episode. Clear the acknowledgment when you're done to restore notifications. | +| Snooze | Per series (group) | You want to quiet an entire alert series for a defined period, for example, during a known noisy window for a specific host. Snooze expires automatically at the end of the duration. | +| Deactivate | Per alert episode | You want to manually close an alert episode that hasn't recovered automatically. Deactivating marks the alert episode as inactive and stops notifications for it. Unlike acknowledge, this closes the alert episode rather than silencing it while leaving it active. | + +Each mechanism is stored as a separate document in `.alert-actions`, so the full gating history for an episode is queryable in Discover. + +### Snooze scope + +Snooze applies at the group level (by `group_hash`), not per individual alert episode. When you snooze one alert episode, every alert episode sharing the same group (all rows with the same `rule_id` and `group_hash`) is silenced for the duration. Snoozing one row in the alerts table silences the entire series for that rule. + + + +:::{note} +Snoozing an alert episode differs from [snoozing an action policy](manage-action-policies.md#enable-disable-and-snooze). When you snooze a policy, the dispatch mechanism is paused and every series the policy would process is silenced. When you snooze an alert episode, you target one specific series before policy matching runs, silencing it regardless of which policy would have handled it. Use policy snooze when you want to pause all notifications from a given policy, for example, during planned maintenance on a destination system. +::: + +## Related pages + + +- [About action policies](about-action-policies.md) to learn how action policies route and throttle alert episodes after silencing. +- [Create and configure an action policy](create-configure-action-policy.md) to set up the policies that run after gating checks pass. +- [Action policy reference](action-policy-reference.md) to look up match condition fields, grouping modes, and frequency options. diff --git a/explore-analyze/alerting/kibana-alerting-experimental/notifications-actions.md b/explore-analyze/alerting/kibana-alerting-experimental/notifications-actions.md new file mode 100644 index 0000000000..a5e265e6c5 --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/notifications-actions.md @@ -0,0 +1,36 @@ +--- +navigation_title: Notifications and actions +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "How to set up notifications and actions for rules in the experimental alerting system using workflows and action policies." +--- + +# Notifications and actions for the {{alerting-v2-system}} + +Rules in the {{alerting-v2-system}} don't send notifications directly. Instead, they produce alert episodes, and you use workflows and action policies to decide what happens next. + +- **Workflows** are the delivery layer. They define what happens when the system decides to act, such as sending a message, calling a webhook, or triggering an automation. +- **Action policies** are the gating layer. They evaluate active alert episodes on a continuous schedule and invoke workflows based on match conditions, grouping, and frequency settings. + +## Get started + +To send notifications or run actions from an alerting v2 rule: + +1. [Build a workflow](../../workflows/get-started/build-your-first-workflow.md) that defines the action to take. +2. [Create an action policy](action-policies/create-configure-action-policy.md) that routes alert episodes to that workflow. + +:::{tip} +If you need an action to fire exactly once in response to a specific alert episode state change (such as opening a ticket when an episode is assigned) use an alert episode lifecycle trigger instead of an action policy. Refer to [Connect workflows](workflows-alerting.md) for a comparison of both approaches. +::: + +## In this section + +- [Connect workflows](workflows-alerting.md) - How action policies and lifecycle triggers invoke workflows at runtime. +- [About action policies](action-policies/about-action-policies.md) - How action policies evaluate and gate alert episodes. +- [Create an action policy](action-policies/create-configure-action-policy.md) - Configure policy type, match conditions, grouping, frequency, and destinations. +- [Action policy reference](action-policies/action-policy-reference.md) - Available match condition fields, grouping modes, and frequency options. +- [Manage action policies](action-policies/manage-action-policies.md) - Enable, disable, snooze, edit, or delete policies. +- [Reduce notification noise](action-policies/reduce-notification-noise.md) - Suppress alerts using acknowledgment, snooze, and maintenance windows. diff --git a/explore-analyze/alerting/kibana-alerting-experimental/workflows-alerting.md b/explore-analyze/alerting/kibana-alerting-experimental/workflows-alerting.md new file mode 100644 index 0000000000..4e11e40fa9 --- /dev/null +++ b/explore-analyze/alerting/kibana-alerting-experimental/workflows-alerting.md @@ -0,0 +1,61 @@ +--- +navigation_title: Connect workflows +applies_to: + stack: experimental 9.5+ + serverless: experimental +products: + - id: kibana +description: "How workflows connect to the experimental alerting system through action policies and alert episode lifecycle triggers, and when to use each." +--- + +# Connect workflows to the {{alerting-v2-system}} [connect-workflows-experimental-alerting-system] + +[Workflows](../../workflows.md) are part of the {{alerting-v2-system}} in {{kib}}. They are the delivery layer that defines what happens when the {{alerting-v2-system}} takes an action, such as sending a message, calling a webhook, or triggering an automation. Workflows are what allow your team's incident response tools to connect with the {{alerting-v2-system}}. + +This page covers how action policies drive workflow invocations at runtime, the available alert episode lifecycle triggers, and when to use each pathway. + +## How the alerting system connects to workflows [connection-pathways] + +The {{alerting-v2-system}} connects to workflows through two pathways. + +- **Action policies** - Action policies evaluate active alert episodes on a continuous schedule and invoke workflows based on match conditions and frequency settings. +- **Alert episode lifecycle triggers** - Workflows are invoked when a specific state change occurs on an alert episode, such as when the alert episode is activated, assigned, or resolved. + +### Action policies [action-policy-driven-workflows] + +Action policies evaluate alert episodes on a continuous schedule and invoke workflows when an episode meets the configured conditions. After a rule runs, the system routes alert episodes to workflows through the following steps. + +``` +Rule → Alert episode → [Dispatcher] → Action policy → Workflow → Notification +``` + +1. A rule evaluates data on a schedule and writes a rule event. +2. In Alert mode, the rule event opens or updates an alert episode. +3. The dispatcher runs on a short interval, independently of the rule schedule, and picks up active alert episodes. +4. For each active alert episode, the dispatcher evaluates all enabled action policies. Each policy runs the episode through suppression, match conditions, grouping, and frequency gates. +5. For policies where the episode clears all gates, the dispatcher invokes the configured workflows. +6. Workflows deliver the notification or run the automation. + +### Alert episode lifecycle triggers [alert-episode-lifecycle-triggers] + +Lifecycle triggers start a workflow immediately in response to a specific state change on an alert episode, without any scheduling or gating. Alert episode lifecycle triggers are a type of [event-driven trigger](../../workflows/triggers/event-driven-triggers.md) that start a workflow automatically when a specific event occurs. + +The {{alerting-v2-system}} emits a trigger event each time an alert episode changes state (for example, when it's activated, assigned to a user, acknowledged, or snoozed) and any workflow attached to that trigger type runs immediately in response. + +For a list of available triggers and event payload fields, refer to [Alert episode lifecycle triggers](../../workflows/triggers/event-driven-triggers.md). + +### When to use action policies or lifecycle triggers [when-to-use-action-policies-lifecycle-triggers] + +If you're unsure whether to use lifecycle triggers or action policies, the following table compares when each option is a good fit. Both can run different workflows simultaneously and coexist without conflict. + +| | Action policies | Lifecycle triggers | +|---|---|---| +| **How they run** | Evaluate alert episodes on a continuous schedule | React immediately to a specific state change | +| **Frequency control** | Apply suppression, grouping, and frequency gates | Fire exactly once per state change, no gates to configure | +| **Best for** | Recurring notifications and escalation logic that runs as long as a problem persists | One-shot automations, such as opening a ticket when an episode is assigned or posting a message when it resolves | + +## Next steps + +- [Create and configure an action policy](action-policies/create-configure-action-policy.md) to start routing alert episodes to workflows. +- [About action policies](action-policies/about-action-policies.md) to learn how action policies evaluate and gate alert episodes before invoking a workflow. + diff --git a/explore-analyze/alerting/watcher/actions.md b/explore-analyze/alerting/watcher/actions.md index 0181302180..b3d6189a5f 100644 --- a/explore-analyze/alerting/watcher/actions.md +++ b/explore-analyze/alerting/watcher/actions.md @@ -147,7 +147,7 @@ If you do not define a throttle period at the action or watch level, the global xpack.watcher.execution.default_throttle_period: 15m ``` -{{watcher}} also supports acknowledgement-based throttling. You can acknowledge a watch using the [ack watch API]({{es-apis}}operation/operation-watcher-ack-watch) to prevent the watch actions from being executed again while the watch condition remains `true`. This essentially tells {{watcher}} "I received the notification and I’m handling it, do not notify me about this error again". An acknowledged watch action remains in the `acked` state until the watch’s condition evaluates to `false`. When that happens, the action’s state changes to `awaits_successful_execution`. +{{watcher}} also supports acknowledgment-based throttling. You can acknowledge a watch using the [ack watch API]({{es-apis}}operation/operation-watcher-ack-watch) to prevent the watch actions from being executed again while the watch condition remains `true`. This essentially tells {{watcher}} "I received the notification and I’m handling it, do not notify me about this error again". An acknowledged watch action remains in the `acked` state until the watch’s condition evaluates to `false`. When that happens, the action’s state changes to `awaits_successful_execution`. To acknowledge an action, you use the [ack watch API]({{es-apis}}operation/operation-watcher-ack-watch): diff --git a/explore-analyze/toc.yml b/explore-analyze/toc.yml index 46aad620e1..6bacb804b8 100644 --- a/explore-analyze/toc.yml +++ b/explore-analyze/toc.yml @@ -382,6 +382,15 @@ toc: - file: report-and-share/reporting-troubleshooting-pdf.md - file: alerting.md children: + - file: alerting/kibana-alerting-experimental/notifications-actions.md + children: + - file: alerting/kibana-alerting-experimental/workflows-alerting.md + - file: alerting/kibana-alerting-experimental/action-policies/about-action-policies.md + - file: alerting/kibana-alerting-experimental/action-policies/common-action-policy-scenarios.md + - file: alerting/kibana-alerting-experimental/action-policies/create-configure-action-policy.md + - file: alerting/kibana-alerting-experimental/action-policies/action-policy-reference.md + - file: alerting/kibana-alerting-experimental/action-policies/manage-action-policies.md + - file: alerting/kibana-alerting-experimental/action-policies/reduce-notification-noise.md - file: alerting/alerts.md children: - file: alerting/alerts/alerting-getting-started.md diff --git a/explore-analyze/workflows/triggers/event-driven-triggers.md b/explore-analyze/workflows/triggers/event-driven-triggers.md index 747135956d..ccc29d1b8d 100644 --- a/explore-analyze/workflows/triggers/event-driven-triggers.md +++ b/explore-analyze/workflows/triggers/event-driven-triggers.md @@ -3,7 +3,7 @@ navigation_title: Event-driven triggers applies_to: stack: preview 9.4+ serverless: preview -description: Run a workflow in response to a platform event. Includes workflows.failed and the cases trigger family. +description: Run a workflow in response to a platform event. Includes workflows.failed, the cases trigger family, and alert episode lifecycle triggers. products: - id: kibana - id: cloud-serverless @@ -15,10 +15,11 @@ products: # Event-driven triggers [workflows-event-driven-triggers] -Event-driven triggers let workflows react to events elsewhere in {{kib}}. Two trigger families are available: +Event-driven triggers let workflows react to events elsewhere in {{kib}}. Three trigger families are available: - **`workflows.failed`** — Fires when another workflow's execution fails. {applies_to}`stack: preview 9.4+` {applies_to}`serverless: preview` - **Cases triggers** — Fire when cases change (created, updated, status changed, attachments added, comments added). {applies_to}`stack: preview 9.5+` {applies_to}`serverless: preview` +- **Alert episode lifecycle triggers** — Fire when an alert episode changes state in the experimental {{alerting-v2-system}}, such as when it is activated, assigned, acknowledged, or snoozed. {applies_to}`stack: experimental 9.5+` {applies_to}`serverless: experimental` :::{warning} The event-driven trigger system is in technical preview, including the triggers documented on this page. The schema and semantics can change in future releases. @@ -341,8 +342,55 @@ triggers: condition: 'event.owner: "securitySolution"' ``` +## Alert episode lifecycle triggers [alert-episode-lifecycle-triggers-event-driven] + +```{applies_to} +stack: experimental 9.5+ +serverless: experimental +``` + +Alert episode lifecycle triggers fire when an alert episode changes state in the experimental {{alerting-v2-system}}. Unlike `workflows.failed` and cases triggers, they are not configured through a `triggers` block in your workflow YAML. They are emitted by the alerting system and automatically invoke any workflow attached to the matching trigger type. Each trigger fires exactly once per state change. There is no polling interval or frequency gate. + +### Available triggers [alert-episode-lifecycle-triggers-available] + +| Trigger ID | When it fires | +|---|---| +| `alerting.episodeActivated` | An alert episode transitions to the active state. | +| `alerting.episodeDeactivated` | An alert episode is manually deactivated or recovers. | +| `alerting.episodeSnoozed` | An alert episode is snoozed. | +| `alerting.episodeUnsnoozed` | An alert episode is unsnoozed. | +| `alerting.episodeAcked` | An alert episode is acknowledged. | +| `alerting.episodeUnacked` | An alert episode acknowledgment is removed. | +| `alerting.episodeAssigned` | An alert episode is assigned to a user. | +| `alerting.episodeUnassigned` | An alert episode assignment is removed. | +| `alerting.episodeTagged` | A tag is applied to an alert episode. | + +### Event payload [alert-episode-lifecycle-triggers-event] + +All lifecycle triggers include these common fields in the event payload. + +| `event.*` field | Contains | +|---|---| +| `event.episodeId` | Unique identifier of the alert episode. | +| `event.ruleId` | ID of the rule that produced the alert episode. | +| `event.spaceId` | ID of the {{kib}} space where the event occurred. | + +Reference these fields with Liquid templating in workflow steps: + +```yaml +- name: log + type: console + with: + message: | + Episode {{ event.episodeId }} from rule {{ event.ruleId }} changed state. +``` + +Use these fields to write workflow conditions that scope the automation to specific rules or episodes. For example, use `event.ruleId: "my-rule-id"` to scope the workflow to alert episodes from a specific rule. + ## Prevent cascading handler loops +This section applies to `workflows.failed` handlers. Alert episode lifecycle triggers fire once per state change and do not re-trigger on workflow failure. + If a handler workflow itself fails, it can re-trigger itself. Two safeguards help you avoid infinite loops: - Every event includes `event.workflow.isErrorHandler`, which is `true` when the failing workflow is itself a handler. Filter on this in your handler's logic to skip handling your own failures. @@ -355,3 +403,4 @@ In practice, keep handler workflows simpler than the workflows they monitor. A h - [Triggers overview](/explore-analyze/workflows/triggers.md): All trigger types. - [Pass data and handle errors](/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md): Per-step `on-failure` strategies complement event-driven handlers. - [Cases steps](/explore-analyze/workflows/steps/cases.md): Open cases from your handler. +- [Connect workflows to the {{alerting-v2-system}}](../../alerting/kibana-alerting-experimental/workflows-alerting.md): Full reference for alert episode lifecycle triggers, including available trigger IDs, event payload fields, and when to use lifecycle triggers versus action policies.