elastic · nastasha-solomon · May 15, 2026 · May 15, 2026
@@ -0,0 +1,159 @@
+---
+navigation_title: Alerts
+applies_to:
+  stack: unavailable
+  serverless: preview
+products:
+  - id: kibana
+description: "Alert episodes in experimental alerting features: lifecycle states, series and episodes, signals versus alerts, and where to find them."
+---
+
+# experimental alerting features alerts
+
+When a rule fires repeatedly on the same problem, a flat list of events doesn't tell you when the issue started, whether it's still happening, or how long it's been going on. Alert episodes fill that gap. Each episode is a persistent record of one issue on one series, from first breach through recovery, with every evaluation appended to the same history. Nothing is overwritten.
+
+<!--[CONTENT NEEDED for M2: UI. Once the navigation and page name have been confirmed, add instructions for opening the Alerts page.]
+-->
+
+## Alert lifecycle [alert-lifecycle]
+
+Every alert episode moves through these states:
+
+```
+inactive → pending → active → recovering → inactive
+```
+
+| State | What it means |
+| --- | --- |
+| Inactive | Problem fully resolved. You get a recovery notification. |
+| Pending | Errors detected, but the system is waiting to confirm it's a real problem before fully alerting. |
+| Active | Problem confirmed and ongoing. This is when you get notified. |
+| Recovering | Errors have stopped, but the system is waiting to confirm it's truly resolved. |
+
+<!-- TODO: Uncomment when PR #6523 (rules) is merged:
+Activation and recovery thresholds control how many consecutive evaluations must agree, or how long the condition must persist, before transitioning. Refer to [Configure a rule](rules/configure-a-rule.md#activation-recovery-thresholds) to learn more about these settings.
+-->
+Activation and recovery thresholds control how many consecutive evaluations must agree, or how long the condition must persist, before transitioning.
+
+### Example: First breach opens, first clear closes
+
+A checkout-latency rule runs in Alert mode every 5 minutes. Latency breaches at 14:05 and clears at 14:50:
+
+1. **14:00** — Routine check. p95 is within budget. The episode is `inactive`.
+2. **14:05** — p95 jumps to 3.1s. The first breach is detected. With no activation threshold, the episode opens immediately as `active`.
+3. **14:10–14:45** — Every evaluation finds high latency. The same episode stays `active`. No new episodes are created.
+4. **14:50** — p95 drops back under 2s. With no recovery threshold, the episode resolves immediately to `inactive`.
+
+One problem is tracked in one episode, even though the rule evaluated many times while the condition was ongoing.
+
+<!-- TODO: Add image once alert-episode-example-without-threshold.png is available in explore-analyze/images/
+:::{image} ../../images/alert-episode-example-without-threshold.png
+:alt: Timeline of a checkout-latency alert episode without thresholds. At 14:05, p95 jumps to 3.1s and the episode opens immediately as active. It stays active through 14:45. At 14:50, p95 drops back under 2s and the episode resolves immediately as inactive.
+:::
+-->
+
+### Example: Waiting for confirmation before opening and closing
+
+The same checkout-latency rule, now with an activation threshold of 2 consecutive breaches and a recovery threshold of 2 consecutive clears:
+
+1. **14:00** — Routine check. p95 is within budget. The episode is `inactive`.
+2. **14:05** — p95 jumps to 3.1s. The first breach is detected. The episode is created in `pending` and the system starts counting consecutive breaches.
+3. **14:10** — p95 is still elevated. The second consecutive breach meets the activation threshold. The episode moves from `pending` to `active`, and the engineer is paged.
+4. **14:10–14:45** — Latency stays elevated. The episode remains `active`.
+5. **14:50** — p95 drops back under 2s. The first clean check moves the episode to `recovering`. The system starts counting consecutive clears.
+6. **14:55** — A second consecutive clear meets the recovery threshold. The episode moves from `recovering` to `inactive`.
+
+Thresholds prevent brief spikes from opening episodes and transient dips from closing them prematurely. The episode waits in `pending` until the problem is confirmed, and waits in `recovering` until the resolution is confirmed.
+
+<!-- TODO: Add image once alert-episode-example-with-activation-threshold.png is available in explore-analyze/images/
+:::{image} ../../images/alert-episode-example-with-activation-threshold.png
+:alt: Timeline of a checkout-latency alert episode with activation threshold of 2 and recovery threshold of 2. At 14:05, the first breach puts the episode in pending. At 14:10, the second consecutive breach moves it to active. At 14:50, the first clean check moves it to recovering. At 14:55, the second consecutive clear resolves it to inactive.
+:::
+-->
+
+## Series
+
+A series is the ongoing relationship between a rule and one specific thing it monitors.
+
+Your rule monitors services. Each service it tracks has its own series, one for `checkout-service`, one for `payment-service`, and so on. A series exists for as long as that rule keeps monitoring that service.
+
+Think of it like a patient's medical file. The file exists as long as the patient is in the system. Individual health incidents come and go, but the file persists.
+
+<!-- TODO: Uncomment when PR #6523 (rules) is merged:
+For the fields that identify a series in alert event documents, refer to [Rule event and field reference](rules/rule-event-field-reference.md#rule-reference).
+-->
+
+### How series and episodes relate
+
+An episode lives inside a series. A series can contain many episodes over its lifetime, one for each time that service had a problem.
+
+```
+Series: checkout-service
+│
+├── Episode 1: errors on April 10 (active → inactive)
+├── Episode 2: errors on April 15 (active → inactive)
+└── Episode 3: errors on April 18 (active right now)
+```
+
+The series is the container. Episodes are the individual problems that happened within it. When the series breaches again after recovering, a new episode starts.
+
+This means you can track "the checkout service was broken from 02:14 to 03:21" and "the payment service was broken at the same time" as separate episodes, even when both come from the same rule.
+
+:::{tip}
+Snooze operates at the series level, not the episode level. If you snooze `checkout-service`, you're silencing all notifications from that series for the next X hours, regardless of how many new episodes start during that time. You're quieting a specific ongoing situation, not a single alert.
+:::
+
+### A practical way to think about it
+
+| Concept | Analogy |
+| --- | --- |
+| Rule | A security camera watching the building |
+| Series | The camera's feed for one specific door |
+| Episode | A specific incident caught on that feed |
+| Rule events | The individual video frames |
+
+The camera runs continuously (rule), always watching door 3 (series). One night someone breaks in. That's an episode. The frames captured during the break-in are the rule events.
+
+## Signals versus alerts
+
+Every time a rule finds a match, it writes a document to `.rule-events`. Whether that document is a signal or an alert depends on the rule's mode, and that choice determines whether the system only records what happened or actively tracks it through to resolution.
+
+A **signal** is a one-time observation. The system writes it and moves on, no lifecycle, no notifications, no follow-up. An **alert** participates in an episode. The system links it to every other document from the same problem, tracks the lifecycle states, and routes notifications through action policies.
+
+| Type | What it is | When it's created |
+| --- | --- | --- |
+| Signal | A point-in-time record that the query matched (`type: signal`). Stored in `.rule-events`. | Rules in Detect mode |
+| Alert | A lifecycle-tracked episode with `type: alert` and `episode.*` fields. Stored in `.rule-events`. | Rules in Alert mode |
+
+A rule in Detect mode only writes signals. It never opens episodes, so action policies have nothing to match against.
+
+## Where alerts live
+
+Alert events are stored in `.rule-events`. Triage actions (acknowledge, snooze, resolve) are stored in `.alert-actions`. Both are queryable in Discover.
+
+<!-- TODO: Fix later
+Every time you take an action on an episode — acknowledging it, snoozing it, resolving it, editing its tags — {{kib}} writes a new document to `.alert-actions`. These documents are append-only and can be queried in Discover for auditing and metrics such as mean time to acknowledge (MTTA). For field definitions, refer to [Alert states and fields reference](alerts/alert-states-and-fields-reference.md#alert-states-reference).
+-->
+
+<!--[CONTENT NEEDED for M2: UI. "V2 Alerting Preview" is a development-phase navigation label. Once the navigation and page name have been confirmed, add instructions for opening the Alerts page.]
+-->
+
+### Data stream storage and retention
+
+Both `.rule-events` and `.alert-actions` are data streams, append-only, time-series stores optimized for writes. On every rule evaluation, {{kib}} writes a **new document** to `.rule-events` rather than updating the previous one. Each document is a point-in-time snapshot. The `episode.status` field records the lifecycle state the episode was in at that exact evaluation. Nothing is overwritten.
+
+<!-- TODO: Fix later
+Because every evaluation produces its own document, you can reconstruct the full history of an episode by querying all documents that share the same `episode.id`. Refer to [Query alerts and signals in Discover](alerts/query-alerts-and-signals-in-discover.md#explore-alerts-discover) for example queries.
+-->
+
+Retention is managed automatically through ILM. Older backing indices move through storage tiers and are deleted when the retention window expires. You do not need to manually remove documents. {{kib}} manages versioning, retention, and lifecycle for both streams. Do not change their mappings or index settings.
+
+## Related pages
+
+- **[View and manage alerts](alerts/view-and-manage-alerts.md):** Open the alert episodes table, triage active episodes, and acknowledge, snooze, or resolve them.
+- **[Query alerts and signals in Discover](alerts/query-alerts-and-signals-in-discover.md):** Use {{esql}} to query `.rule-events` and `.alert-actions` for ad hoc analysis and dashboards.
+- **[Alert states and fields reference](alerts/alert-states-and-fields-reference.md):** Look up lifecycle states, field names, and episode document structure.
+<!-- TODO: Uncomment when PR #6525 (workflows/notifications) is merged:
+- **[Notifications](notifications.md):** Set up action policies to route alert episodes to the right people and channels.
+- **[Notification gating](notifications/notification-gating.md):** Understand how acknowledge, snooze, and deactivate control whether an episode triggers a notification.
+-->
@@ -0,0 +1,58 @@
+---
+navigation_title: Alert states and fields
+applies_to:
+  stack: unavailable
+  serverless: preview
+products:
+  - id: kibana
+description: "Reference for episode status, `.rule-events` row status, and `.alert-actions` document fields in the experimental alerting features."
+---
+
+# Alert states and fields reference [alert-states-reference]
+
+
+Alert states and fields are part of the experimental alerting features in Kibana. Use these tables when you read alert UI state, query `.rule-events` or `.alert-actions` in Discover, or align API payloads with what operators see. For triage controls (acknowledge, snooze, resolve, tags) and how they map to storage, refer to [Alert actions](view-and-manage-alerts.md#alert-actions).
+<!-- TODO: Uncomment when PR #6523 (rules) is merged:
+For rule evaluation fields on `.rule-events`, refer to [Rule event and field reference](../rules/rule-event-field-reference.md#rule-reference).
+-->
+
+## Episode status
+
+The `episode.status` field appears on documents with `type: alert` in `.rule-events`. It represents the current lifecycle state of the alert episode.
+
+| Value | Description |
+|---|---|
+| `inactive` | Episode not in an active breach state in the lifecycle model. |
+| `pending` | Condition met but activation thresholds not yet satisfied. |
+| `active` | Episode is actively breaching per rule logic. |
+| `recovering` | Condition clearing but recovery thresholds not yet satisfied. |
+
+<!--[CONTENT NEEDED for M2: M2 adds two first-class severity fields to the episode document:
+
+- `episode.severity` - The severity from the most recent rule event (current state). Updated each evaluation.
+- `episode.severity_max` - The highest severity seen over the episode's lifetime. Never decreases; enables "peaked at CRITICAL" display in the episode detail UI.
+
+Add a new table or rows for these fields once M2 ships. The episode detail UI is also expected to change to surface these fields directly. Several M2 details are still open: the enforced value set (or lack thereof), whether de-escalation triggers policy re-evaluation, and whether manual override is supported. Do not document specifics until resolved.]
+-->
+
+## Rule event status
+
+The `status` field appears on all documents in `.rule-events`, for both `type: signal` and `type: alert`. It reflects the outcome of a single rule evaluation row, independent of the episode lifecycle.
+
+| Value | Description |
+|---|---|
+| `breached` | Condition met for this evaluation row. |
+| `recovered` | Recovery path satisfied for this evaluation row. |
+| `no_data` | No-data handling produced a no-data style outcome for this evaluation. |
+
+## Alert action document fields
+
+When a user or the system records an action on an alert episode, {{kib}} writes a document to `.alert-actions`. Use this stream for triage history, operational metrics such as mean time to acknowledge (MTTA), and auditing. It does not store what your rule query returned on each run — that output is in `.rule-events`.
+
+| Field | Type | Description |
+|---|---|---|
+| `@timestamp` | date | When the action was recorded. |
+| `episode.id` | keyword | Target episode. |
+| `rule.id` | keyword | Rule that owns the episode. |
+| `action.type` | keyword | The action type, for example: <br>- `acknowledge`: User acknowledged the alert.<br>- `snooze`: Notifications snoozed for a period.<br>- `tag`: Tag applied to the alert.<br>- `fire`: Notification or escalation fired for the episode.<br>- `unmatched`: No action policy matched the episode, so no workflow ran for it under these policies. <br><br> For the full set of action types and UI behavior, refer to [Alert actions](view-and-manage-alerts.md#alert-actions). |
+| `episode.status_count` | long | Count of consecutive evaluations in the current `episode.status`. Only set when `episode.status` is `pending` or `recovering`.<br>For example, if the episode stays `pending` for three rule evaluations in a row, the value is `3`. |
@@ -0,0 +1,31 @@
+---
+navigation_title: Query alerts in Discover
+applies_to:
+  stack: unavailable
+  serverless: preview
+products:
+  - id: kibana
+description: "Use {{esql}} in Discover against `.rule-events` and `.alert-actions`: sample queries, trends, and MTTA-style analysis for the experimental alerting features."
+---
+
+# Query alerts and signals in Discover [explore-alerts-discover]
+
+
+Querying alerts and signals in Discover is part of the experimental alerting features in Kibana. Discover gives you direct {{esql}} access to everything the experimental alerting features record, including rule evaluation history, episode progressions, triage actions, and operational metrics like mean time to acknowledge.
+
+The Alerts UI shows current episode state. Discover lets you go further: ask arbitrary questions, spot trends over time, replay how a specific incident unfolded, or correlate alert history with other data in your environment.
+
+To use this page, open Discover, select {{esql}}, paste a query from the examples below, then adjust the time range and placeholders (`YOUR_RULE_ID`, `YOUR_GROUP_HASH`) to match your environment.
+
+<!--[CONTENT NEEDED: The queries on this page use `.rule-events` and `.alert-actions` directly. Confirm whether these will remain the intended query surface, or whether users should query an ES|QL view or a stable user-facing data stream instead. Update all examples accordingly before publishing.]
+
+<!--[CONTENT NEEDED for M2: Review and expand the query examples below once M2 field renames (`group_hash` → `series.key`, new `series.tracked_by`, `episode.severity`, `episode.severity_max`) are finalized. Add examples that take advantage of the new first-class severity and series fields.]
+-->
+
+<!-- TODO: Fix later
+For field names, types, and episode fields, refer to [Alert states and fields reference](alert-states-and-fields-reference.md#alert-states-reference). For triage in the product UI, refer to [View, manage, and reference alerts](view-and-manage-alerts.md).
+-->
+<!-- TODO: Uncomment when PR #6523 (rules) is merged and restore full sentence:
+For field names, types, and episode fields, refer to [Alert states and fields reference](alert-states-and-fields-reference.md#alert-states-reference) and [Rule event and field reference](../rules/rule-event-field-reference.md#rule-reference). For triage in the product UI, refer to [View, manage, and reference alerts](view-and-manage-alerts.md).
+-->
+