Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docset.yml
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,8 @@ subs:
ls-pipelines-app: "Logstash Pipelines"
maint-windows-app: "Maintenance Windows"
maint-windows-cap: "Maintenance windows"
alerting-v2: "experimental alerting features"
alerting-v2-cap: "Experimental alerting features"
alerting-v2-system: "experimental alerting system"
alerting-v2-system-cap: "Experimental alerting system"
custom-roles-app: "Custom Roles"
data-source: "data view"
data-sources: "data views"
Expand Down
34 changes: 34 additions & 0 deletions explore-analyze/alerting/kibana-alerting-experimental/rules.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
navigation_title: Rules
applies_to:
stack: experimental 9.5+
serverless: experimental
products:
- id: kibana
description: "Rules in Kibana's experimental alerting system define what to detect using ES|QL. Evaluation runs on a schedule; alerts, action policies, and notifications flow from rule detections."
---

# Rules in the {{alerting-v2-system}}

A rule is where the {{alerting-v2-system}} starts. It points {{kib}} at the data you care about, describes what counts as a problem in {{esql}}, and says how often to check. Alerts, action policies, and notifications all flow from what a rule detects.

This page explains what rules do, what they don't control, and how to choose a creation path.

## What rules do [detection-and-notification]

On each run, a rule executes an {{esql}} query against your data. If the query finds a match and the rule is in Signal mode, it writes a _signal_, a point-in-time record that the condition was met. In Alert mode, it also maintains an _alert episode_ for each matched series, tracking state from first breach through recovery.

When creating a rule, choose Signal mode to record and query results without alerting anyone, or Alert mode when you want to track issues and route notifications.

## What rules don't do

Rules only define *what* to detect. They don't control notifications, who gets notified, or when. That's the job of action policies, which are global objects scoped to your space that match alert episodes from any rule. A rule has no say in which policies pick it up.

This separation means you can build and test a rule without anyone getting paged, update notification routing without touching the rule, and have multiple action policies respond to the same rule independently.

## Next steps

- **[Create a rule](rules/create-a-rule.md):** Compare creation paths and choose the one that fits your workflow.
- **[Configure a rule](rules/configure-a-rule.md):** Set the schedule, grouping, activation thresholds, recovery conditions, and no-data behavior.
- **[View and manage rules](rules/view-manage-rules.md):** Enable, disable, clone, delete, and bulk-manage rules from the rules list.
- **[ES|QL query patterns](rules/esql-query-patterns.md):** Browse query patterns ordered by complexity, from a basic event filter to SLO burn rate and persistent breach detection.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
navigation_title: Configure a rule
applies_to:
stack: experimental 9.5+
serverless: experimental
products:
- id: kibana
description: "Configure rules in the experimental alerting system: mode, ES|QL query, grouping, schedule, activation and recovery thresholds, no-data handling, tags, and evaluation."
---

# Configure a rule in the {{alerting-v2-system}} [rule-settings]

Rules in the {{alerting-v2-system}} have three required settings and several optional ones. The table below lists each setting, what it controls, and whether it is required to save a rule.

Check warning on line 13 in explore-analyze/alerting/kibana-alerting-experimental/rules/configure-a-rule.md

View workflow job for this annotation

GitHub Actions / build / vale

Elastic.DirectionalLanguage: Don't use directional language. Use 'the following [element]' instead of 'The table below'.
<!-- TODO: Uncomment when PR #6525 (workflows/notifications) is merged:
For notification routing, refer to [Notifications](../notifications.md).
-->

| Setting | Description | Required |
| --- | --- | --- |
| [Rule mode](configure-rule-mode.md) | Signal or Alert. Controls whether matching rows generate signal documents or tracked alert episodes. | Required |
| [ES\|QL query](configure-rule-query.md) | The detection logic and the parameters available in query expressions. | Required |
| [Schedule and lookback](configure-rule-schedule.md) | How often the rule evaluates and how far back the query looks. | Required |
| [Severity](configure-rule-severity.md) | Assign severity levels to alert episodes using a `severity` column in query output. | Optional |
| [Grouping](configure-rule-grouping.md) | Track multiple subjects (hosts, services, users) as independent alert series in one rule. | Optional |
| [Activation and recovery thresholds](configure-rule-thresholds.md) | Reduce noise with delay modes for opening and closing alert episodes. (Alert mode only) | Optional |
| [No-data handling](configure-no-data-handling.md) | What the rule records when the base query returns no results. | Optional |
| [Tags and runbooks](configure-rule-tags.md) | Free-form labels and investigation guides attached to the rule. (Alert mode only) | Optional |
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
navigation_title: No-data handling
applies_to:
stack: experimental 9.5+
serverless: experimental
products:
- id: kibana
description: "Configure no-data handling for rules in Kibana's experimental alerting system to control what happens when a query returns no results."
---

# No-data handling in the {{alerting-v2-system}} [no-data-handling]

No-data handling is an optional setting for rules in the {{alerting-v2-system}}. Use `no_data_strategy` to control what the rule records when the base query returns no results. Setting this correctly prevents false recoveries and misleading `no_data` events when data sources stop reporting.

Set `no_data_strategy` to one of the following values:

| Value | Description |
| --- | --- |
| `emit` | Record a no-data event. |
| `last_known_status` | Hold the last known lifecycle state. An active breach stays active; a recovered episode stays recovered. |
| `recover` | Treat absence as recovery. |
| `none` | Turn off no-data detection |

:::{note}
`no_data_strategy` does not detect when a specific host or data source stops reporting while others continue. Refer to [No-data detection](esql-no-data-detection.md) for an {{esql}} pattern that surfaces silent sources as alert rows.
:::

## Examples

### Maintain alert state during a metrics collection outage

This rule monitors infrastructure CPU. If the metrics collection agent stops sending data, you don't want an active CPU breach to auto-recover just because the query returned nothing. Set `no_data_strategy` to `last_known_status`. The rule holds the alert in its current state until data resumes.

Check warning on line 32 in explore-analyze/alerting/kibana-alerting-experimental/rules/configure-no-data-handling.md

View workflow job for this annotation

GitHub Actions / build / vale

Elastic.DontUse: Don't use 'just'.

Use this when an empty query result most likely means a pipeline problem rather than a genuine recovery.

### Surface a broken data pipeline as an alert

This rule monitors for login events from an identity provider. If no events appear in the lookback window, it's unusual enough to warrant attention: either the pipeline is broken or something has suppressed activity. Set `no_data_strategy` to `emit`. The absence is recorded as a `no_data` event in `.rule-events`, making it visible alongside other rule activity.

Use this when receiving no data is itself a signal worth investigating.
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
navigation_title: Grouping
applies_to:
stack: experimental 9.5+
serverless: experimental
products:
- id: kibana
description: "Configure rule grouping in Kibana's experimental alerting system to track multiple subjects as independent alert series."
---

# Rule grouping in the {{alerting-v2-system}} [rule-grouping]

Rule grouping is an optional setting in the {{alerting-v2-system}} that lets a single rule track multiple things independently. For example, a rule monitoring CPU usage across hosts can produce a separate alert series for each host, rather than one alert for everything combined.

<!-- TODO: Confirm with engineering before adding a sentence here:
1. Is snooze state per series or per rule? If rule-level only, do not include it.
2. Does "lifecycle" mean the standard inactive → pending → active → recovering episode states? If so, rewrite as: "In Alert mode, each group becomes its own alert episode with an independent lifecycle. One group can be active while another has recovered, and notifications apply per episode, not across all groups combined."
-->

:::{note}
Rule grouping controls how alert series are created. Notification grouping (configured on an action policy) controls how those alert episodes are batched into messages. These are separate settings.
:::

## Aligning `BY` fields with your rule's query

The `BY` fields you specify for grouping must match the columns in the `BY` clause of your {{esql}} `STATS` command. If they don't match, the grouping configuration won't work as expected.

:::{tip}
Write the query first, then set the group fields. That way the `BY` columns are already defined and you can select them directly. If you later add or remove a `BY` field in the query, update the group fields to match.
:::

<!--[CONTENT NEEDED for M2: M2 replaces the current `grouping.fields` approach with a `track_by` concept and introduces a `series.*` block that gives each series a stable, explicit identity. Update this section to document the `track_by` configuration, explain how the `series.*` block differs from the current `group_hash` approach, and revise any references to `grouping.fields` or the `BY` clause alignment requirement once the M2 schema is finalized.]
-->

## Examples

### Track error rates per service

This rule counts HTTP errors per service and opens a separate alert series for each service that exceeds the threshold. Each service gets its own lifecycle: if the checkout service recovers but the payments service stays critical, those are tracked independently.

```esql
FROM logs-*
| WHERE @timestamp >= ?_tstart AND @timestamp < ?_tend
| STATS error_count = COUNT_IF(http.response.status_code >= 500) BY service.name // Group field: one series per service
| WHERE error_count > 10
| KEEP service.name, error_count
```

Without a matching `grouping.fields` entry, the rule treats all services as a single combined series. A spike in one service would activate the alert for everything, and recovery requires all services to drop below the threshold at the same time.

### Track CPU usage per host and region

When the query groups by multiple fields, include all of them in `grouping.fields` to create one alert series per unique combination.

Check notice on line 53 in explore-analyze/alerting/kibana-alerting-experimental/rules/configure-rule-grouping.md

View workflow job for this annotation

GitHub Actions / build / vale

Elastic.Wordiness: Consider using 'all' instead of 'all of '.

```esql
FROM metrics-*
| WHERE @timestamp >= ?_tstart AND @timestamp < ?_tend
| STATS avg_cpu = AVG(system.cpu.total.pct) BY host.name, cloud.region // Group fields: one series per host+region pair
| WHERE avg_cpu > 0.90
| KEEP host.name, cloud.region, avg_cpu
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
navigation_title: Rule mode
applies_to:
stack: experimental 9.5+
serverless: experimental
products:
- id: kibana
description: "Choose between Signal and Alert mode for rules in Kibana's experimental alerting system."
---

# Rule mode in the {{alerting-v2-system}} [rule-mode]

Rule mode is a required setting for rules in the {{alerting-v2-system}}. Rule mode set by the rule creation method and some creation paths only support one mode. Refer to [Create a rule](create-a-rule.md) for available options.

| Mode | Behavior | Best for |
| --- | --- | --- |
| Signal | Records each matching row as a signal document. No alert episodes, no notifications. | Testing a new query, building detection history, or observing matches without notifying anyone. |
| Alert | Creates an alert episode for each matching row. Episodes are tracked through lifecycle states, appear on the **Alerts** page, and can be routed to notifications by action policies. | Production rules where breaches should be tracked, escalated, or routed to a notification channel. |

## Examples

### Build detection history before enabling alerts

You're writing a new detection query and want to verify it produces the results you expect before anyone gets paged. Create the rule in Signal mode so matches are recorded in `.rule-events` and you can inspect them in Discover without opening any alert episodes or triggering notifications. Once the matches look correct, edit the rule and switch it to Alert mode.

### Route critical episodes to an on-call workflow

You have a checkout service error rate rule and want on-call engineers notified when it fires. Create the rule in Alert mode so each breach opens a tracked episode that action policies can route to a notification channel. The rule's episodes appear on the **Alerts** page and are visible to any action policy whose KQL matcher matches the episode fields.
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
navigation_title: ES|QL query
applies_to:
stack: experimental 9.5+
serverless: experimental
products:
- id: kibana
description: "Configure the ES|QL query and query parameters for rules in Kibana's experimental alerting system."
---

# {{esql}} query in the {{alerting-v2-system}} [esql-query-rule]

Every rule in the {{alerting-v2-system}} uses an {{esql}} query to define what to evaluate. The query has two parts: a base query that shapes and filters the data, and an optional alert condition that determines which rows become alert events. For more advanced use cases, the query also supports [dynamic values](#dynamic-query-values) for filtering by the evaluation window or setting configurable thresholds through the rule form.

## Base query [query-base]

The base query is the main {{esql}} expression. Use `FROM` to point the rule at the indices or data streams to read. Shape results with `STATS`, `WHERE`, and `EVAL`, and control which fields are stored with `KEEP`. The base query runs on every evaluation, even when no match occurs, which is what enables no-data detection and recovery. The [{{esql}} reference](elasticsearch://reference/query-languages/esql.md) covers all available commands and processing functions.

This query counts HTTP 5xx errors per service over the lookback window and stores only the fields needed for triage:

```esql
FROM logs-*
| WHERE @timestamp >= ?_tstart AND @timestamp < ?_tend
| STATS error_count = COUNT_IF(http.response.status_code >= 500) BY service.name
| KEEP service.name, error_count
```

Without an alert condition, every row returned by this query is treated as a breach, so the rule fires for every service that logged at least one 5xx error.

## Alert condition [query-alert-condition]

The alert condition is an optional `WHERE` clause appended after the base query. Only rows that pass the condition are treated as breaches. Use an alert condition when the base query returns aggregate results and you only want to alert when a value crosses a threshold.

This example extends the base query above to only fire when a service exceeds 10 errors:

```esql
FROM logs-*
| WHERE @timestamp >= ?_tstart AND @timestamp < ?_tend
| STATS error_count = COUNT_IF(http.response.status_code >= 500) BY service.name
| KEEP service.name, error_count
// Alert condition: only services with more than 10 errors become breaches
| WHERE error_count > 10
```

The `KEEP` command controls which fields appear on each stored alert event. Only the fields in `KEEP` are available for action policy matchers, grouping keys, and triage.

## Use dynamic values in your rule query [dynamic-query-values]

{{esql}} rule queries support two kinds of parameters that make queries more dynamic: time bounds that the executor injects automatically, and form variables you define when creating a rule. You don't need either to write a working rule, but they're useful for scoping queries precisely to the evaluation window or making thresholds configurable without editing the query.

### Filter your query to the evaluation window (`?_tstart` and `?_tend`) [time-bound-parameters]

`?_tstart` and `?_tend` are reserved parameter names that the rule executor binds automatically on every evaluation. They hold the start and end timestamps of the lookback window, so you can scope a query to exactly the period the rule is evaluating:

```esql
FROM logs-*
| WHERE @timestamp >= ?_tstart AND @timestamp < ?_tend
| STATS error_count = COUNT(*) BY service.name
| WHERE error_count > 10
```

These parameters work across all rule creation methods.

### Set configurable values in the rule form (`?param`) [form-variables]

When creating a rule through the form, you can use `?param` placeholders, such as `?threshold`, as {{esql}} Control variables. The form resolves these variables and inlines their values into the query before saving. The stored rule and the YAML representation of it contains the resolved values, not the placeholder tokens.

Check warning on line 66 in explore-analyze/alerting/kibana-alerting-experimental/rules/configure-rule-query.md

View workflow job for this annotation

GitHub Actions / build / vale

Elastic.Spelling: 'inlines' is a possible misspelling.

## Examples

### Scoping a query to the evaluation window

This query counts HTTP errors per service over the rule's lookback window. `?_tstart` and `?_tend` are bound automatically at runtime, so the query always covers exactly the configured window regardless of when the rule runs.

```esql
FROM logs-*
| WHERE @timestamp >= ?_tstart AND @timestamp < ?_tend
| STATS error_count = COUNT_IF(http.response.status_code >= 500) BY service.name
| WHERE error_count > 0
| KEEP service.name, error_count
```

If you omit the time filter, the query scans the full index on every evaluation, which increases query cost and can return stale matches from earlier runs.

### Using a form variable for a configurable threshold

This query uses `?threshold` as a form variable so the threshold can be set in the rule form UI without editing the query directly. When the rule is saved, the form resolves `?threshold` to its configured value and inlines it. The stored query contains the literal number, not the placeholder.

Check warning on line 86 in explore-analyze/alerting/kibana-alerting-experimental/rules/configure-rule-query.md

View workflow job for this annotation

GitHub Actions / build / vale

Elastic.Spelling: 'inlines' is a possible misspelling.

```esql
FROM logs-*
| WHERE @timestamp >= ?_tstart AND @timestamp < ?_tend
| STATS p99_latency = PERCENTILE(http.response_time, 99) BY service.name
| WHERE p99_latency > ?threshold
| KEEP service.name, p99_latency
```

Because `?threshold` is resolved before saving, YAML and API representations of this rule always show the resolved value.
Loading
Loading