diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules.md b/explore-analyze/alerting/kibana-alerting-experimental/rules.md index 3466834bbd..6abc9ea88f 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules.md @@ -12,7 +12,9 @@ description: "Rules in Kibana's experimental alerting system define what to dete Rules are part of the {{alerting-v2-system}} in {{kib}}. For rules in the existing Kibana alerting system, see [Rules in Kibana alerting](../alerts/create-manage-rules.md). -A rule is where the {{alerting-v2-system}} starts. It points {{kib}} at the data you care about, describes what counts as a problem in {{esql}}, and says how often to check. Alerts, action policies, and notifications all flow from what a rule detects. +A rule is where the {{alerting-v2-system}} starts. It points {{kib}} at the data you care about, describes what counts as a problem in {{esql}}, and says how often to check. Alerts, action policies, and notifications all flow from what a rule detects. + +This page explains what rules do, what they don't control, and how to choose a creation path. ## What rules do [detection-and-notification] diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/author-rules.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/author-rules.md index 7fa4df90a1..bf31453687 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/author-rules.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/author-rules.md @@ -10,30 +10,30 @@ description: "Write ES|QL detection queries for rules in Kibana's experimental a # Rule authoring in the {{alerting-v2-system}} [author-rules] -Rule authoring is part of the {{alerting-v2-system}} in {{kib}}. Authoring a rule means deciding three things: what condition in your data counts as a problem, whether you want the rule to silently record matches or actively track issues through to resolution, and which fields to carry forward onto each alert event so you can route and triage effectively. Getting these decisions right in the query is what makes the difference between a rule that fires on everything and one that surfaces the problems that actually need attention. +Rules in the {{alerting-v2-system}} use {{esql}} queries to detect problems in your data. When you create a rule, you choose whether it should fire notifications or just record matches, define the detection logic, and optionally set severity levels. -This page covers the query concepts behind a rule definition. For settings beyond the query (such as schedules, grouping, and lifecycle thresholds), refer to [Configure a rule](configure-a-rule.md). Once you understand what goes into a rule, you can write one using the [rule builder](create-rule-from-rule-builder.md), [YAML editor](create-rule-with-yaml.md), or [a Discover session](create-rule-from-discover.md). +This page explains the concepts behind rule modes, detection logic, and severity. For other rule settings such as schedules, grouping, and lifecycle thresholds, refer to [Configure a rule](configure-a-rule.md). When you're ready to create a rule, use the [rule builder](create-rule-from-rule-builder.md), [YAML editor](create-rule-with-yaml.md), or [Discover](create-rule-from-discover.md). -## Choose a rule mode +## Rule modes -Before creating the rule, decide what you want it to do: +Whether a rule fires notifications or just records matches is determined by its mode. Rules can run in Signal or Alert mode. The mode available to you depends on how the rule is created. | Mode | What it does | | --- | --- | -| Signal (`kind: signal`) | Records query matches as signals. No alert episodes, no notifications. Good for testing a query or building a data history without alerting anyone. | -| Alert (`kind: alert`) | Records matches and maintains alert episodes with lifecycle states. Alert episodes appear on the **Alerts** page and can be matched by action policies for notifications. | +| Signal | Records query matches as signals. No alert episodes, no notifications. Good for testing a query or building a data history without alerting anyone. | +| Alert | Records matches and maintains alert episodes with lifecycle states. Alert episodes appear on the **Alerts** page and can be matched by action policies for notifications. | -You can switch a rule's mode after creation from the rule list or rule detail page. +## Define the detection logic [esql-query-structure] -## The {{esql}} query [esql-query-structure] - -Every rule has two parts to its query: the base query and the alert conditions. +The detection logic is an {{esql}} query that tells the rule what to look for. Every query has two parts: the base query and the alert conditions. ### Base query (required) -The main {{esql}} expression. It runs on every evaluation, selects data from `FROM`, shapes results with `STATS`, `WHERE`, `EVAL`, and controls which fields are stored with `KEEP`. The base query always runs, even when no breach occurs, which is what enables no-data detection and recovery. +The base query is the main {{esql}} expression. Use `FROM` to point the rule at the indices or data streams to read. The query itself defines the scope. + +The base query shapes results with `STATS`, `WHERE`, and `EVAL`, and controls which fields are stored with `KEEP`. It runs on every evaluation, even when no match occurs, which is what enables no-data detection and recovery. The [{{esql}} reference](elasticsearch://reference/query-languages/esql.md) covers all available commands and processing functions. ### Alert conditions (optional) -A `WHERE` clause applied after the base query. Only rows that pass the alert condition are treated as breaches. Without an alert condition, every row returned by the base query is a breach. +The alert conditions query is the `WHERE` clause that's applied after the base query. Only rows that pass the alert condition are treated as breaches. Without an alert condition, every row returned by the base query is a breach. ```esql -- Base query: compute average CPU per host @@ -46,83 +46,11 @@ WHERE avg_cpu > 0.9 The `KEEP` command controls which fields appear on each stored alert event. Only the fields you `KEEP` are available for policy matchers, grouping keys, and triage in the Alerts UI. -### Recovery condition [recovery-condition] - -Recovery conditions are optional. They determine when an active alert episode closes. - -Three recovery types are available: - -| Type | Behavior | -| --- | --- | -| Default | The alert episode recovers automatically when the alert condition is no longer met. | -| Custom | Uses a separate {{esql}} expression you define. The alert episode recovers when that expression returns no rows. | - -When no recovery condition is configured, Default recovery applies. Use a custom recovery condition when the absence of a breach isn't a reliable recovery indicator — for example, when the alert condition uses a narrow lookback window and you want recovery to require the condition to stay clear across a longer period, or when the recovery logic requires a different query shape than the alert detection. - -## Data sources - -Use `FROM` to point the rule at the indices or data streams to read. The query itself defines the scope. There's no separate data source step. - -```esql -FROM logs-checkout-service-* -| WHERE http.response.status_code >= 500 -| STATS error_count = COUNT(*) BY service.name -| KEEP service.name, error_count -``` - -The [{{esql}} reference](elasticsearch://reference/query-languages/esql.md) covers all available commands and processing functions. - -## Conditions and thresholds [conditions-and-thresholds] - -The alert condition in {{esql}} defines what counts as a breach in each evaluation. - -The activation and recovery thresholds on the rule are separate from the query. They control how many consecutive breaches must occur, or how long the condition must persist, before an alert episode becomes active or moves back to inactive. Those settings are in [Configure a rule](configure-a-rule.md#activation-recovery-thresholds). - - - ## Severity levels [severity-levels] -Severity is a first-class field on alert episodes in the {{alerting-v2-system}}. To set severity, include a column named `severity` in your ES|QL query output and add it to your `KEEP` list. The framework reads that column after each evaluation and maps it to one of five fixed levels: - -| Value | Meaning | -| --- | --- | -| `info` | Informational; lowest urgency | -| `low` | Low-severity condition | -| `medium` | Moderate-severity condition | -| `high` | High-severity condition | -| `critical` | Critical; highest urgency | - -Severity matching is case-insensitive. Values that don't match one of the five levels are silently ignored — the alert episode is still created, but `episode.severity` is not set. - -Severity is set only on `breached` rule events. `recovered` and `no_data` events don't carry a severity value. - -When severity is set, the framework stores two fields on the alert episode: - -- `episode.severity` — The severity value from the most recent breached event (current state). -- `episode.severity_max` — The highest severity level observed across the episode's lifetime. Useful for routing like "this episode peaked at critical." - -Both fields are available for action policy matchers. Refer to [Rule event and field reference](rule-event-field-reference.md#episode-fields) for more information about these fields. - -```esql -FROM metrics-* -| STATS - errors_5m = COUNT_IF(outcome == "failure" AND @timestamp >= NOW() - 5 minutes), - total_5m = COUNT_IF(@timestamp >= NOW() - 5 minutes) - BY service.name -| EVAL burn_5m = errors_5m / total_5m -| EVAL severity = CASE( - burn_5m > 14.4, "critical", - burn_5m > 6.0, "high", - burn_5m > 1.0, "medium", - "low" - ) -| WHERE burn_5m > 1.0 -| KEEP service.name, burn_5m, severity -``` +To assign severity to alert episodes, include a column named `severity` in your {{esql}} query and add it to your `KEEP` list. The framework maps it to one of five fixed levels: `info`, `low`, `medium`, `high`, or `critical`. -The `severity` column in `KEEP` is what tells the framework to set `episode.severity` on each resulting alert episode. +Severity is used by action policy matchers for routing and triage. For details on how severity is stored and made available to matchers, refer to [Configure a rule](configure-a-rule.md). ## Next steps diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/configure-a-rule.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/configure-a-rule.md index 2dd23ec35b..8057032152 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/configure-a-rule.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/configure-a-rule.md @@ -11,13 +11,13 @@ description: "Configure rules in the experimental alerting system: mode, ES|QL, # Configure a rule in {{alerting-v2-system}} [rule-settings] -Rule configuration is part of the {{alerting-v2-system}} in {{kib}}. The {{esql}} query defines what a rule detects. The settings on this page determine whether it behaves correctly in production: how often it runs, how it groups related problems, when it opens and closes alerts, and what it does when data stops arriving. - -For query authoring, refer to [Author rules](author-rules.md). +Rule configuration is part of the {{alerting-v2-system}} in {{kib}}. The {{esql}} query defines what a rule detects. The settings on this page determine whether it behaves correctly in production. For query authoring, refer to [Author rules](author-rules.md). +This page covers how to configure a rule's mode, schedule, grouping, activation and recovery thresholds, and no-data behavior. + :::{note} For Alert-mode rules, you can create and attach action policies directly from the rule form's **Actions** step. Existing action policies that match the rule are listed on load. If none exist, an onboarding panel appears. Action policies are created or updated alongside the rule when you save. The **Actions** step is not shown for Signal-mode rules. @@ -33,22 +33,95 @@ Choose a mode that matches how you want to use results: | Signal | Signals only: the rule produces detections without alert lifecycle tracking or notifications. | | Alert | Lifecycle tracking and actions. Alerts move through states (pending, active, recovering, and so on), and you can attach action policies so alert episodes dispatch through workflows. | -Several settings on this page apply only when the rule is in Alert mode (`kind: alert`). +Several settings on this page apply only when the rule is in Alert mode. ## {{esql}} query [esql-query-rule] -The rule's {{esql}} query defines what to evaluate. It has a base query and an optional alert condition. Together they drive which rows become alert events and how no-data behavior applies. See [{{esql}} query structure](author-rules.md#esql-query-structure) for how those pieces interact with no-data behavior and `KEEP`. +The rule's {{esql}} query defines what to evaluate. It has a base query and an optional alert condition. Together they drive which rows become alert events and how no-data behavior applies. Refer to [{{esql}} query structure](author-rules.md#esql-query-structure) for how those pieces interact with no-data behavior and `KEEP`. + +### Query parameters [query-parameters] + +{{esql}} rule queries support `?param` placeholder syntax for values you want to supply separately from the query body. Placeholders let you write a reusable query and vary threshold values through the rule configuration rather than hard-coding them inline. + +```esql +FROM metrics-* +| STATS avg_cpu = AVG(system.cpu.total.pct) BY host.name +| WHERE avg_cpu > ?threshold +``` + + + +## Severity [rule-severity] + +Severity is optional. To set it, include a column named `severity` in your {{esql}} query output and add it to your `KEEP` list. The framework reads that column after each evaluation and maps it to one of five fixed levels: + +| Value | Meaning | +| --- | --- | +| `info` | Informational; lowest urgency | +| `low` | Low-severity condition | +| `medium` | Moderate-severity condition | +| `high` | High-severity condition | +| `critical` | Critical; highest urgency | + +### How the {{alerting-v2-system}} reads severity values + +The {{alerting-v2-system}} reads the `severity` column after each evaluation and applies the following rules: + +- Matching is case-insensitive. +- Values that don't match one of the five levels are silently ignored. The alert episode is still created, but `severity` isn't set. +- Severity is only set on `breached` events. `recovered` and `no_data` events don't carry a severity value. + +### Stored fields + +When severity is set, the {{alerting-v2-system}} stores two fields on the alert episode that are available for action policy matchers: + +| Field | Description | +| --- | --- | +| `severity` | The severity value from the most recent breached event (current state). | +| `severity_max` | The highest severity level observed across the episode's lifetime. Useful for routing like "this episode peaked at critical." | + +Refer to [Rule event and field reference](rule-event-field-reference.md#episode-fields) for more information about these fields. + +### Example + +```esql +FROM metrics-* +| STATS + errors_5m = COUNT_IF(outcome == "failure" AND @timestamp >= NOW() - 5 minutes), + total_5m = COUNT_IF(@timestamp >= NOW() - 5 minutes) + BY service.name +| EVAL burn_5m = errors_5m / total_5m +| EVAL severity = CASE( + burn_5m > 14.4, "critical", + burn_5m > 6.0, "high", + burn_5m > 1.0, "medium", + "low" + ) +| WHERE burn_5m > 1.0 +| KEEP service.name, burn_5m, severity +``` + +- **`STATS`** - Counts failures and total requests in the last 5 minutes, grouped by service. +- **`EVAL burn_5m`** - Computes the error burn rate as a ratio of failures to total requests. +- **`EVAL severity`** - Maps the burn rate to a severity level. +- **`WHERE`** - Only rows above the threshold count as breaches. +- **`KEEP`** - Includes `severity` in the output so the {{alerting-v2-system}} reads and stores it. ## Rule grouping [rule-grouping] +Rule grouping lets a single rule track multiple things independently. For example, a rule monitoring CPU usage across hosts can produce a separate alert series for each host, rather than one alert for everything combined. + +Each group tracks its own lifecycle, recovery, and snooze state. -Rule grouping splits alert event generation by one or more group key fields so that each unique combination of field values produces its own alert series. Each series has independent lifecycle tracking, recovery detection, and per-series snooze. +Rule grouping controls how alert series are created. Notification grouping — configured on an action policy — controls how those alert episodes are batched into messages. These are separate settings. -Group key fields must align with the `BY` clause in your {{esql}} query's `STATS` command. See [Author rules](author-rules.md) for query patterns. +### Aligning `BY` fields with your rule's query -When writing a rule that uses grouping, writing the query first and then specifying group fields avoids mismatches between the query output and the grouping configuration. The group fields in the rule form reflect the columns produced by the `STATS ... BY` clause, so if you add or remove a `BY` field in the query, the corresponding group field must be updated to match. +The `BY` fields you specify for grouping must match the columns in the `BY` clause of your {{esql}} `STATS` command. If they don't match, the grouping configuration won't work as expected. -Note that rule grouping is separate from notification grouping on an action policy, which controls how alert episodes batch into messages. +:::{tip} +Write the query first, then set the group fields. That way the `BY` columns are already defined and you can select them directly. If you later add or remove a `BY` field in the query, update the group fields to match. +::: @@ -75,22 +148,32 @@ The lookback must not exceed 365 days. If the lookback is shorter than the execu Activation and recovery thresholds control when alerts transition between lifecycle states. They reduce noise from short spikes and from rapid flapping between active and recovered. -These settings are only available for Alert-mode rules (`kind: alert`). +These settings are only available for Alert-mode rules. ### Activation thresholds -Configure activation using count, timeframe, or both: +Activation thresholds control when a breached rule transitions from pending to active. Three delay modes are available: + +| Mode | Behavior | When to use | +| --- | --- | --- | +| Immediate | Opens an alert episode as soon as the threshold is breached on the first evaluation. | Use when any single breach warrants attention and latency matters. | +| Breaches | Opens an alert episode after the threshold is breached a set number of times in a row. | Use when a single breach isn't enough reason to act (for example, when brief spikes are normal and you only care if the condition keeps firing). | +| Duration | Opens an alert episode after the threshold has been continuously breached for a set time. | Use when duration of the problem matters more than how many evaluations caught it (for example, sustained high CPU rather than a momentary spike). | + +You can also use Breaches and Duration together. For example, require the threshold to be breached five times in a row _and_ persist for at least two minutes before an alert episode opens. Use `pending_operator` to control whether both constraints must be met (`AND`) or either one is enough (`OR`). + +Configure these modes using the following fields. Timeframe values must be between 5 seconds and 365 days. | Field | Description | | --- | --- | -| `pending_count` | Consecutive breaches required | -| `pending_timeframe` | Minimum duration the condition must persist (shown as **Alert delay** in the UI) | -| `pending_operator` | How to combine count and timeframe (`AND` or `OR`) | - -Each timeframe value must be between 5 seconds and 365 days. +| `pending_count` | Number of consecutive breach evaluations required before the alert episode opens. | +| `pending_timeframe` | How long the condition must remain breached before the alert episode opens. | +| `pending_operator` | When both `pending_count` and `pending_timeframe` are set, controls whether both must be satisfied (`AND`) or either one is enough (`OR`). | ### Recovery thresholds +Recovery thresholds control when an active alert episode transitions back to inactive. The same delay modes available for activation — consecutive recoveries required, minimum recovery duration, or both — apply here. + | Field | Description | | --- | --- | | `recovering_count` | Consecutive recoveries required | @@ -100,7 +183,9 @@ Each timeframe value must be between 5 seconds and 365 days. Time frame fields use the same 5 seconds to 365 days bounds as activation timeframes. :::{note} -The `recovery_policy` field controls how recovery is detected, separately from how many recoveries are required. When creating a rule through the UI, `recovery_policy.type` defaults to `no_breach`, which recovers the alert episode when its active group no longer appears in the breach batch. When creating a rule through the API or Agent Builder, you can omit `recovery_policy` entirely to suppress recovery events and keep alert episodes active until closed manually. For the full field reference, go to [YAML rule schema reference](yaml-rule-schema-reference.md#recovery-policy-fields). +The `recovery_policy` field controls how recovery is detected, separately from how many recoveries are required. When creating a rule through the UI, `recovery_policy.type` defaults to `no_breach`, which recovers the alert episode when its active group no longer appears in the breach batch. + +When creating a rule through Agent Builder, you can omit `recovery_policy` entirely to suppress recovery events and keep alert episodes active until closed manually. For the full field reference, go to [YAML rule schema reference](yaml-rule-schema-reference.md#recovery-policy-fields). ::: ## No-data handling [no-data-handling] @@ -119,12 +204,12 @@ Set `no_data.behavior` to one of the following values: These behaviors apply when the base query returns zero rows. They don't help when you want to *detect* that a specific host or data source has gone silent. That requires a different query approach. See [No-data detection](esql-query-patterns.md#no-data-esql-query) in the authoring guide for an {{esql}} pattern that surfaces silent sources as alert rows. -## Tags and investigation guide [tags-investigation] +## Tags and runbooks [tags-investigation] Alert-mode rules support two optional metadata fields: -- **Tags**: Free-form labels for filtering and organization. -- **Investigation guide**: A runbook stored with the rule so responders have context when an alert fires. +- **Tags** - Free-form labels for filtering and organization. +- **Runbooks** - An investigation guide stored with the rule so responders have context when alerts are generated. ## Evaluate rule output [evaluate-rule-output] diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-discover.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-discover.md index 25a6aae0e1..c1399cbb66 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-discover.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-discover.md @@ -11,7 +11,9 @@ description: "Turn an ES|QL query in Discover into a rule in the experimental al # Create rules from Discover in the {{alerting-v2-system}} [create-rules-discover] -Discover-based rule creation is part of the {{alerting-v2-system}} in {{kib}}. When you build an {{esql}} query that surfaces interesting patterns, you can convert it into a rule without rewriting the query. For the full rule form including schedule and lifecycle settings, refer to [Configure a rule](configure-a-rule.md). +Discover-based rule creation is part of the {{alerting-v2-system}} in {{kib}}. When you build an {{esql}} query that surfaces interesting patterns, you can convert it into a rule without rewriting the query. + +This page covers the entry points for starting a rule from Discover, how the query pre-fills the rule form, and how to use the preview panel to verify grouping and output shape before saving. For the rule form, including schedule and lifecycle settings, refer to [Configure a rule](configure-a-rule.md). ## Entry points [discover-rule-entry-points] diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-rule-builder.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-rule-builder.md index 3822417ea8..2b5c247544 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-rule-builder.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-from-rule-builder.md @@ -10,35 +10,67 @@ description: "Create ES|QL rules, AI-assisted rules, and Threshold Alert rules i # Create rules in the {{alerting-v2-system}} [create-rules-rule-builder] -The rule builder is part of the {{alerting-v2-system}} in {{kib}}. For a full description of what each setting does, refer to [Configure a rule](configure-a-rule.md). +The rule builder is part of the {{alerting-v2-system}} in {{kib}}. This page covers the three creation paths available from the rules list, how to configure an ES|QL rule, and how the Threshold Alert builder works. For descriptions of what each setting does, refer to [Configure a rule](configure-a-rule.md). ## Creation paths [rule-creation-paths] All rules are created through a flyout that opens from the **Create rule** button in the rules list. Three options are available: -- **Create ES|QL rule**: Write the detection query as {{esql}} directly, with a live preview of results. A YAML editor is also available within this path. Use this when you want full control over the query. If you already have a query working in Discover, you can [start from there instead](create-rule-from-discover.md) to skip re-entering it. -- **Create with AI Agent**: Describe what you want to detect in plain language. The AI agent generates a rule definition and walks you through reviewing and saving it. Use this when you know the problem but aren't sure how to write the {{esql}}. -- **Start from a rule builder**: Choose a structured rule type and fill in a guided form. The builder generates the {{esql}} query automatically. Use this when you want to create a standard rule type without writing {{esql}} by hand. Refer to [Threshold Alert](#threshold-alert) for the available type. +| Option | What it does | When to use | +| --- | --- | --- | +| **Create ES\|QL rule** | Write the detection query as {{esql}} directly. Includes a query sandbox for previewing results and a YAML editor. If you already have a query working in Discover, you can [start from there instead](create-rule-from-discover.md). | When you want full control over the query. | +| **Create with AI Agent** | Describe what you want to detect in plain language. The AI agent generates a rule definition and walks you through reviewing and saving it. | When you know the problem but aren't sure how to write the {{esql}}. | +| **Start from a rule builder** | Choose a structured rule type and fill in a guided form. The builder generates the {{esql}} query automatically. | When you want to create a standard rule type without writing {{esql}} by hand. | -## Threshold Alert +## Create an {{esql}} rule [rule-builder-form-yaml] -Threshold Alert is the rule type available under **Start from a rule builder**. Use it to monitor one or more metrics and alert when they cross a threshold, with multi-condition support and custom aggregations. +The **Create ES|QL rule** path gives you two ways to define the query: -You define the rule by filling in structured fields for the data source, aggregation, filters, and alert conditions. The builder generates the {{esql}} query automatically from those inputs. Rules created through the builder can be reopened and edited in builder mode as long as the underlying {{esql}} hasn't been edited directly. +- **Rule form** - Fill in the step-by-step form with a live preview of results. +- **YAML mode** - Switch to YAML and edit the raw rule definition. You can switch between form and YAML at any point; edits are preserved. -Use the **Create ES|QL rule** path when the detection logic requires more than a single metric threshold, such as multi-window burn rates or cross-series correlation. +The YAML editor isn't available within the Threshold Alert builder or other rule builder types. For a list of supported YAML fields, refer to [YAML rule schema reference](yaml-rule-schema-reference.md). -### Recovery conditions [threshold-builder-recovery] +### Preview query results in the sandbox [rule-builder-query-sandbox] -When you define alert conditions in the Threshold Alert builder, the builder automatically derives corresponding recovery conditions by flipping the comparators. For example, a `greater than` alert condition produces a `less than or equal to` recovery condition. You can customize the derived conditions or leave the defaults as generated. Recovery conditions are preserved correctly when you reopen an existing rule in builder mode for editing. +The query sandbox lets you run your {{esql}} query against current data and preview the results before applying them to the rule form. Use the time field selector and date picker to control the time range, then select **Search** (or press ⌘↵) to execute. When the results look correct, select **Apply changes** to populate the form. -## ES|QL rule: form and YAML editing [rule-builder-form-yaml] +Use the sandbox to: -The **Create ES|QL rule** path supports both a step-by-step form and a YAML editing mode. You can switch between them at any point. Edits in YAML mode are preserved when you return to the form view. To discard YAML edits and return to the prior form state, use the **Cancel YAML** option. +- **Confirm grouping** - Check that your `BY` clause produces the series you intend, for example, one distinct series per host or per service, not a single undifferentiated result. +- **Catch unexpected output** - Verify that the query returns data in the right shape for the alert condition you plan to set. A query that returns zero rows or an unexpected field name won't behave as expected once the rule runs on a schedule. +- **Refine before committing** - Edit the query and re-run it as many times as needed without leaving the rule creation form. -Use YAML mode when you want to fine-tune the raw rule definition, copy a configuration from an existing rule, or work faster than filling in individual form fields allows. The YAML editor isn't available within the Threshold Alert builder or other rule builder types. +## Create a rule with AI Agent [create-rule-ai-agent] -For a list of supported YAML fields, refer to [YAML rule schema reference](yaml-rule-schema-reference.md). +The **Create with AI Agent** option opens the Elastic AI agent pre-loaded with rule management knowledge. Describe what you want to monitor in plain language and the agent resolves the relevant data source and builds a rule proposal. + +The proposal appears as an inline attachment card in the conversation showing the rule name, type, schedule, and tags. Select the card to open a flyout with three tabs: + +- **Conditions** - The full rule configuration, including query, thresholds, grouping, and schedule. +- **Query preview** - Runs the {{esql}} query and shows results inline so you can verify the detection logic without leaving the conversation. +- **Runbook** - A free-text runbook associated with the rule. + +The agent doesn't save the rule automatically. When the proposal looks correct, select **Save as rule** from the flyout header to persist it. After saving, you can ask the agent to configure notifications, which creates an action policy scoped to that rule. + +:::{note} +Signal-mode rules don't support notifications. If you ask the agent to set up notifications on a signal rule, the agent will explain the limitation and offer to convert the rule to Alert mode or create a new Alert-mode rule. +::: + +### Example prompts for creating rules [ai-agent-sample-prompts] + +Use these prompts as a starting point, then adjust them to your data and thresholds: + +- Create an error threshold rule on my checkout service data. Alert when there are more than 3 HTTP 5xx errors in the past 5 minutes, grouped by URL path. +- Monitor average CPU usage across all hosts. Alert when any host exceeds 90% for more than 10 minutes. +- Alert me when log volume from the payments service drops below 100 events in a 5-minute window. This likely means data has stopped flowing. +- Set up a rule that tracks error rate by service. Alert at medium severity when the rate exceeds 1%, and critical when it exceeds 5%. + +## Use the rule builder [use-rule-builder] + +The **Start from a rule builder** option provides a guided form for creating rules without writing {{esql}} by hand. The builder generates the {{esql}} query automatically from structured inputs for the data source, aggregation, filters, and alert conditions. + +### Threshold Alert [use-threshold-alert-builder] + +Threshold Alert is the only rule type available in the rule builder. Use it to monitor one or more metrics and alert when they cross a threshold, with multi-condition support and custom aggregations. - diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-with-yaml.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-with-yaml.md index 89834259e3..d19e3d9fab 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-with-yaml.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/create-rule-with-yaml.md @@ -11,11 +11,30 @@ description: "Define rules as YAML in Kibana's experimental alerting system for # Create rules using the YAML editor in {{alerting-v2-system}} [create-rules-yaml] -The YAML editor is part of the {{alerting-v2-system}} in {{kib}}. It lets you define rules as text documents rather than filling in a form. Use it when you want to version-control rule definitions alongside your other configuration, manage rules through infrastructure-as-code tooling, copy or adapt a rule quickly without re-entering settings by hand, or provision many rules at once. +The YAML editor is part of the {{alerting-v2-system}} in {{kib}}. It lets you define rules as text documents rather than filling in a form. -If you're creating a rule from scratch and want guidance through each setting, the [rule builder](create-rule-from-rule-builder.md) is the better starting point. If you have a query already working in Discover, you can [create a rule directly from there](create-rule-from-discover.md). +This page covers when to use the YAML editor instead of the rule builder, and how to get started. For the full list of supported fields and accepted values, refer to [YAML rule schema reference](yaml-rule-schema-reference.md). -For the full list of supported YAML fields and their accepted values, refer to [YAML rule schema reference](yaml-rule-schema-reference.md). +## When to use the YAML editor + +Use the YAML editor when you want to copy or adapt a rule quickly without re-entering settings by hand, or provision many rules at once. If you're creating a rule from scratch and want guidance through each setting, the [rule builder](create-rule-from-rule-builder.md) is the better starting point. If you have a query already working in Discover, you can [create a rule directly from there](create-rule-from-discover.md). + +## YAML-only mode when editing rules [yaml-only-edit] + +When you reopen a rule for editing, the form/YAML toggle is disabled if the rule's YAML configuration contains settings the form cannot represent. The rule opens in YAML-only mode. This prevents the form from silently dropping fields it doesn't know how to display on save. The YAML editor remains fully functional, and all fields round-trip without loss. + +The following configurations force YAML-only mode when editing: + +| Configuration | Why the form can't represent it | +| --- | --- | +| `query.format: standalone` with `kind: alert` | The form is built around the composed query format (base query plus condition blocks). Standalone format rules cannot be loaded into the form editor. | +| `recovery_strategy: no_breach` or `recovery_strategy: none` | The form only supports custom recovery queries. The no-breach and none strategies have no form equivalent. | +| `no_data_strategy` (any active value) | The form has no controls for no-data handling. | +| `query.no_data` block | The form has no UI for inline no-data query definitions. | + +If the toggle is disabled and you want to use the form, you must remove the non-representable configuration from the YAML before saving, then reopen the rule. + + diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/esql-query-patterns.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/esql-query-patterns.md index 2cddc60b1d..2c2f815653 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/esql-query-patterns.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/esql-query-patterns.md @@ -13,7 +13,7 @@ description: "Advanced ES|QL query patterns for rules in the experimental alerti ES|QL query patterns for rules are part of the {{alerting-v2-system}} in {{kib}}. Some detection problems can't be expressed as a single metric compared to a fixed threshold. You might need to know whether an SLO is burning through its error budget across multiple time windows at once. Or whether a specific host has gone silent, rather than whether the query returned nothing. Or whether a condition has persisted continuously across consecutive time buckets rather than appearing once. These are structurally different problems that require different query shapes. -Use this page when a basic `STATS ... WHERE` pattern isn't enough, or when the detection logic itself requires multi-window calculation, last-seen reasoning, or bucket-level persistence checks. If you're still learning how rules in the {{alerting-v2-system}} work, start with [Author rules](author-rules.md) first. +This page covers query patterns for SLO burn rate detection across multiple windows, no-data detection for silent hosts or stopped sources, and persistent breach detection using bucket-level continuity checks. Use it when a basic `STATS ... WHERE` pattern isn't enough. If you're still learning how rules in the {{alerting-v2-system}} work, start with [Author rules](author-rules.md) first. ## Basic threshold query [threshold-query] @@ -83,7 +83,7 @@ The burn rate multipliers (14.4×, 6×) reflect standard SLO error budget consum Because the query computes several window pairs in one pass, the lookback window on the rule must cover the longest window in the query (3 days in the example above). -The `severity` column in `KEEP` maps directly to `episode.severity` on each resulting alert episode. For the accepted values and matching rules, see [Severity levels](author-rules.md#severity-levels). +The `severity` column in `KEEP` maps directly to `episode.severity` on each resulting alert episode. For the accepted values and matching rules, refer to [Severity levels](author-rules.md#severity-levels). ## No-data detection [no-data-esql-query] diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/rule-event-field-reference.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/rule-event-field-reference.md index dc275b3e35..978af181d8 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/rule-event-field-reference.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/rule-event-field-reference.md @@ -120,5 +120,4 @@ These fields only appear on documents with `type: alert`, written by rules runni | `episode.id` | keyword | Episode identifier for this series. | | `episode.status` | keyword | One of: `inactive`, `pending`, `active`, `recovering`. | | `episode.status_count` | long | Count of consecutive evaluations in the current `episode.status`. Only set when `episode.status` is `pending` or `recovering`. | -| `episode.severity` | keyword | Severity level from the most recent breached event. One of: `info`, `low`, `medium`, `high`, `critical`. Not set when the query output does not include a `severity` column, or when the value does not match a recognized level. Never set on `recovered` or `no_data` events. | -| `episode.severity_max` | keyword | Highest severity level observed across the episode's lifetime (high-water mark). Enables routing or display based on peak severity, for example, "this episode peaked at `critical`". | +| `severity` | keyword | Severity level from the most recent breached event. One of: `info`, `low`, `medium`, `high`, `critical`. Not set when the query output does not include a `severity` column, or when the value does not match a recognized level. Never set on `recovered` or `no_data` events. | diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/view-manage-rules.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/view-manage-rules.md index 1803f0bd96..fda179e400 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/view-manage-rules.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/view-manage-rules.md @@ -10,38 +10,37 @@ description: "Search, filter, sort, and bulk-manage rules from the rules list in # View and manage rules in the {{alerting-v2-system}} [manage-rules] -Rule management is part of the {{alerting-v2-system}} in {{kib}}. After a rule is created, edit its settings, pause it, remove it, and more from the page listing rules. The rules list gives you search, filter, sort, and bulk actions across all rules in the space. Selecting a rule name opens its details page, where you can review the full configuration and edit or act on it directly. +Rule management is part of the {{alerting-v2-system}} in {{kib}}. This page covers how to find and filter rules in the rules list, quick-edit settings without leaving the list, use the rule summary flyout, and navigate the rule details page including the alert activity timeline. ## Find and filter rules [find-filter-rules] -Use the search bar at the top of the rules list to find rules by name or description. The search field accepts plain text. Each space-separated term is matched independently (terms are AND'd) using prefix matching. Type any part of a rule name or description to narrow the list. +Use the search bar to find rules by name or description. Each space-separated term is matched independently using prefix matching. Tags and grouping fields appear in results but aren't searchable. -Search matches rule name and description only. Tags and grouping fields are displayed in search results but aren't included in free-text search. Special characters in rule names and descriptions, including consecutive hyphens, are handled correctly. +Combine text search with filter controls to narrow by rule type, status, or tags. Select any column header to sort, or use bulk actions to enable, disable, or delete multiple rules at once. -Combine text search with filter controls to narrow results further by rule type, status, or tags. Select any column header to sort the list. Use bulk actions when you need to enable, disable, or delete multiple rules at once. - -## Quick-edit a rule [quick-edit-rule] +## Edit a rule inline [quick-edit-rule] To update common rule settings without opening the full rule details page, use the inline edit option on any row in the rules list. The inline editor also opens from the rule summary flyout header. -Editable fields in this view include name, description, tags, grouping key, time field, interval, lookback window, alert delay, recovery type, and recovery delay. The {{esql}} query and alert tracking behavior are read-only in this view. For changes to the query or rule mode, open the full rule details page. - Use inline edit when you need to adjust metadata or scheduling settings quickly without navigating away from the list. -## Rule summary flyout [rule-summary-flyout] +## Inspect a rule with the summary flyout [rule-summary-flyout] To inspect a rule without navigating away from the rules list, select the expand icon on any row. The rule summary flyout opens alongside the list and shows a snapshot of the rule: its status, last run time, recent alert episode activity, and quick actions such as enable, disable, and snooze. Use the flyout when you want to confirm a rule is healthy or take a quick action without committing to a full page load. To open the complete rule configuration with all settings and edit controls, select the rule name in the table row or in the flyout header. -## Rule details page +## Review rule configuration and activity [rule-details-page] + +The rule details page is organized into tabs that let you review a rule's configuration and activity history. +- **Overview** (Alert mode only): Shows an alert activity timeline for each series the rule tracks. The timeline displays a color-coded history of alert episode state transitions per series, along with summary statistics: alert episodes started, recovered, still open, and median duration. A link to view all matching alert episodes is available, filtered to the current rule and time range. Lane labels appear only for grouped rules. The overview tab is not shown for Signal-mode rules, which don't open alert episodes. - **Conditions**: The full {{esql}} base query, alert condition, schedule, lookback, grouping, and recovery settings. -- **Runbook**: If the rule has an investigation guide, it appears here alongside Conditions. +- **Runbook**: If the rule has an investigation guide, it appears here. Use **Edit** to modify the rule, or the actions menu to enable, disable, clone, or delete it. -## Disable versus snooze +## Disable or snooze a rule [disable-snooze-rule] Use **Disable** when you want the rule to stop running entirely until you re-enable it. This is different from snoozing, which suppresses notifications or quiets a series or policy without stopping rule evaluation. diff --git a/explore-analyze/alerting/kibana-alerting-experimental/rules/yaml-rule-schema-reference.md b/explore-analyze/alerting/kibana-alerting-experimental/rules/yaml-rule-schema-reference.md index 8937f0c0cd..7bd9948ceb 100644 --- a/explore-analyze/alerting/kibana-alerting-experimental/rules/yaml-rule-schema-reference.md +++ b/explore-analyze/alerting/kibana-alerting-experimental/rules/yaml-rule-schema-reference.md @@ -21,6 +21,9 @@ YAML rule schema is part of the {{alerting-v2-system}} in {{kib}}. This page lis | `metadata.name` | string | Max 256 characters | The name of the rule. | | `schedule.every` | duration | For example, `5s`, `1m`, `5m` | How often the rule runs. Minimum interval applies. | | `evaluation.query.base` | string | Valid {{esql}} query, max 10,000 characters | The query that checks your data on each run. | +| `evaluation.query.blocks` | array of strings | Valid {{esql}} expressions | Optional additional {{esql}} expressions appended to the base query. Each block is applied in sequence after the base query runs. Use blocks to define reusable conditions or to separate the base data shape from alert conditions. | + + ## Metadata fields