From 08eca00685dbd9ffa65e3e292533ee22c4f9f5a9 Mon Sep 17 00:00:00 2001 From: Omri Rosner Date: Wed, 24 Jun 2026 12:07:59 +0300 Subject: [PATCH 1/2] docs: document workflow loop guardrails Document loop-level and iteration-level safety controls for foreach and while workflow steps. --- .../pass-data-handle-errors.md | 2 +- .../workflows/steps/flow-control-steps.md | 8 +++- explore-analyze/workflows/steps/foreach.md | 48 +++++++++++++++++++ explore-analyze/workflows/steps/while.md | 26 +++++++++- 4 files changed, 80 insertions(+), 4 deletions(-) diff --git a/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md b/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md index e22eeafb44..4f1980f95f 100644 --- a/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md +++ b/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md @@ -198,9 +198,9 @@ In the following example: ### Restrictions [workflows-on-failure-restrictions] -- Flow-control steps (`if`, `foreach`) cannot have workflow-level `on-failure` configurations. - Fallback steps run only after all retries have been exhausted. - When combined, failure-handling options are processed in this order: retry → fallback → continue. +- For `foreach` and `while` loops, use `iteration-on-failure` when you want to handle one failed iteration without failing the whole loop. ### Handle failures across workflows [workflows-cross-workflow-handler] diff --git a/explore-analyze/workflows/steps/flow-control-steps.md b/explore-analyze/workflows/steps/flow-control-steps.md index fc3f5727b4..25b1c4e0ab 100644 --- a/explore-analyze/workflows/steps/flow-control-steps.md +++ b/explore-analyze/workflows/steps/flow-control-steps.md @@ -54,12 +54,16 @@ For expression syntax and additional examples, refer to [If step](/explore-analy ## `foreach` [foreach] -Iterate over an array, running nested steps once per item. Inside the loop, the current item is available as `foreach.item`, the zero-based position as `foreach.index`, and the total count as `foreach.total`. +Iterate over an array, running nested steps once per item. Inside the loop, the current item is available as `foreach.item`, the zero-based position as `foreach.index`, and the total count as `foreach.total`. Use `max-iterations`, `timeout`, `on-failure`, and `if` to guard the loop as a whole; use `iteration-timeout` and `iteration-on-failure` to guard each iteration. ```yaml - name: process_alerts type: foreach foreach: "${{ event.alerts }}" + max-iterations: + limit: 100 + on-limit: fail + iteration-timeout: "30s" steps: - name: log_alert type: console @@ -71,7 +75,7 @@ For the full parameter reference, refer to [Foreach step](/explore-analyze/workf ## `while` [while] -Loop while a KQL condition evaluates to true. The `max-iterations` field caps the number of iterations and defaults to **2000**. The default `on-limit` behavior is `continue`, which means the step succeeds quietly when the cap is reached. To fail the workflow on the cap instead, use the object form with `on-limit: fail`. +Loop while a KQL condition evaluates to true. The `max-iterations` field caps the number of iterations and defaults to **2000**. The default `on-limit` behavior is `continue`, which means the step succeeds quietly when the cap is reached. To fail the workflow on the cap instead, use the object form with `on-limit: fail`. Like `foreach`, `while` also supports loop-level `timeout`, `on-failure`, and `if`, plus per-iteration `iteration-timeout` and `iteration-on-failure`. ```yaml - name: poll_until_ready diff --git a/explore-analyze/workflows/steps/foreach.md b/explore-analyze/workflows/steps/foreach.md index 229cb83470..c76ec8d39a 100644 --- a/explore-analyze/workflows/steps/foreach.md +++ b/explore-analyze/workflows/steps/foreach.md @@ -24,12 +24,21 @@ Use the following parameters to configure a `foreach` step: | `type` | Yes | Step type - must be `foreach` | | `foreach` | Yes | A template or JSON expression that evaluates to an array | | `steps` | Yes | An array of steps to run for each iteration | +| `if` | No | KQL expression that skips the entire loop when it evaluates to false | +| `max-iterations` | No | Maximum number of iterations to run. Use a number or an object with `limit` and `on-limit`. | +| `timeout` | No | Timeout for the entire loop | +| `on-failure` | No | Loop-level failure handling policy. Supports the same `continue`, `retry`, and `fallback` options as other steps. | +| `iteration-timeout` | No | Timeout for each individual iteration | +| `iteration-on-failure` | No | Per-iteration failure handling policy. Supports `continue`, `retry`, and `fallback` without failing the whole loop. | ```yaml steps: - name: loopStep type: foreach foreach: + max-iterations: + limit: 100 + on-limit: fail steps: # Steps to run for each item # Current item is available as 'foreach.item' @@ -39,6 +48,45 @@ steps: Inside the loop, the current item is always available as `foreach.item`. You cannot customize this variable name. :::: +## Guardrails + +```{applies_to} +stack: ga 9.4+ +serverless: ga +``` + +Use loop-level guardrails to control the `foreach` step as a whole: + +* `max-iterations` prevents unexpectedly large collections from creating runaway loops. A bare number is treated as `{ limit: N, on-limit: continue }`, so the loop succeeds when the limit is reached. Use the object form with `on-limit: fail` when reaching the cap should fail the workflow. +* `timeout` limits the total time spent in the loop, across all iterations. +* `on-failure` defines loop-level failure handling with the same `continue`, `retry`, and `fallback` options used by other steps. +* `if` skips the entire loop when the condition evaluates to false. + +Use iteration-level guardrails to control each pass through the loop: + +* `iteration-timeout` limits how long one iteration can run. +* `iteration-on-failure` handles failures for one iteration with `continue`, `retry`, or `fallback` without failing the whole loop. + +```yaml +steps: + - name: processAlerts + type: foreach + foreach: "${{ event.alerts }}" + if: "inputs.process_alerts : true" + max-iterations: + limit: 100 + on-limit: fail + timeout: "10m" + iteration-timeout: "30s" + iteration-on-failure: + continue: true + steps: + - name: logAlert + type: console + with: + message: "Processing alert {{ foreach.item._id }}" +``` + The `foreach` field supports the following expression types: * [Template expressions](#template-expressions) diff --git a/explore-analyze/workflows/steps/while.md b/explore-analyze/workflows/steps/while.md index c709f31014..ba64f9aca8 100644 --- a/explore-analyze/workflows/steps/while.md +++ b/explore-analyze/workflows/steps/while.md @@ -30,14 +30,31 @@ Use `while` for polling patterns: checking a status until it reaches `ready`, re | `type` | top level | string | Yes | Must be `while`. | | `condition` | top level | string | Yes | KQL expression evaluated each iteration. The loop continues while it is true. | | `steps` | top level | array | Yes | Loop body. | +| `if` | top level | string | No | KQL expression that skips the entire loop when it evaluates to false. | | `max-iterations` | top level | number or object | No | Limit for number of iterations. Default is **2000**. Bare number is treated as `{ limit: N, on-limit: continue }`. Use the object form to opt into `on-limit: fail`. | +| `timeout` | top level | duration | No | Timeout for the entire loop. | +| `on-failure` | top level | object | No | Loop-level error-handling policy. Supports the same `continue`, `retry`, and `fallback` options as other steps. | | `iteration-timeout` | top level | duration | No | Per-iteration timeout. | -| `iteration-on-failure` | top level | object | No | Per-iteration error-handling policy. Same shape as `on-failure`. | +| `iteration-on-failure` | top level | object | No | Per-iteration error-handling policy. Supports `continue`, `retry`, and `fallback` without failing the whole loop. | :::{warning} `while` defaults to `max-iterations: 2000` with `on-limit: continue`, which means the step **succeeds quietly when the cap is reached**. If your loop needs to fail the workflow on hitting the cap, set `on-limit: fail` explicitly. Always think about the cap on loops that depend on external state to avoid runaway executions or silently truncated work. ::: +## Guardrails + +Use loop-level guardrails to control the `while` step as a whole: + +* `max-iterations` prevents runaway loops. The default is **2000** iterations. A bare number is treated as `{ limit: N, on-limit: continue }`, so the loop succeeds when the limit is reached. Use the object form with `on-limit: fail` when reaching the cap should fail the workflow. +* `timeout` limits the total time spent in the loop, across all iterations. +* `on-failure` defines loop-level failure handling with the same `continue`, `retry`, and `fallback` options used by other steps. +* `if` skips the entire loop when the condition evaluates to false. + +Use iteration-level guardrails to control each pass through the loop: + +* `iteration-timeout` limits how long one iteration can run. +* `iteration-on-failure` handles failures for one iteration with `continue`, `retry`, or `fallback` without failing the whole loop. + ### `max-iterations` shape ```yaml @@ -63,10 +80,17 @@ Inside the `steps` block of a `while`, the following variables are available: ```yaml - name: poll_job type: while + if: "inputs.poll : true" condition: "steps.check.output.status : pending" max-iterations: limit: 60 on-limit: fail + timeout: "10m" + iteration-timeout: "30s" + iteration-on-failure: + retry: + max-attempts: 2 + delay: "1s" steps: - name: check type: elasticsearch.search From 2ac6ac9d1434eafed5b8c1324c962358636a4d5c Mon Sep 17 00:00:00 2001 From: Omri Rosner Date: Thu, 25 Jun 2026 16:27:02 +0300 Subject: [PATCH 2/2] docs: address workflow loop guardrail review Apply review feedback to clarify loop guardrail references and keep overview docs concise. --- .../pass-data-handle-errors.md | 1 + .../workflows/steps/flow-control-steps.md | 4 +- explore-analyze/workflows/steps/foreach.md | 77 +++++++++---------- explore-analyze/workflows/steps/while.md | 2 +- 4 files changed, 42 insertions(+), 42 deletions(-) diff --git a/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md b/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md index 4f1980f95f..6dea66889e 100644 --- a/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md +++ b/explore-analyze/workflows/authoring-techniques/pass-data-handle-errors.md @@ -198,6 +198,7 @@ In the following example: ### Restrictions [workflows-on-failure-restrictions] +- The `if` step cannot have an `on-failure` configuration. (`foreach` and `while` loops do support loop-level `on-failure`; see their step references.) - Fallback steps run only after all retries have been exhausted. - When combined, failure-handling options are processed in this order: retry → fallback → continue. - For `foreach` and `while` loops, use `iteration-on-failure` when you want to handle one failed iteration without failing the whole loop. diff --git a/explore-analyze/workflows/steps/flow-control-steps.md b/explore-analyze/workflows/steps/flow-control-steps.md index 25b1c4e0ab..acca5787f3 100644 --- a/explore-analyze/workflows/steps/flow-control-steps.md +++ b/explore-analyze/workflows/steps/flow-control-steps.md @@ -54,7 +54,7 @@ For expression syntax and additional examples, refer to [If step](/explore-analy ## `foreach` [foreach] -Iterate over an array, running nested steps once per item. Inside the loop, the current item is available as `foreach.item`, the zero-based position as `foreach.index`, and the total count as `foreach.total`. Use `max-iterations`, `timeout`, `on-failure`, and `if` to guard the loop as a whole; use `iteration-timeout` and `iteration-on-failure` to guard each iteration. +Iterate over an array, running nested steps once per item. Inside the loop, the current item is available as `foreach.item`, the zero-based position as `foreach.index`, and the total count as `foreach.total`. It also supports guardrails to cap iterations, set timeouts, and handle failures. ```yaml - name: process_alerts @@ -75,7 +75,7 @@ For the full parameter reference, refer to [Foreach step](/explore-analyze/workf ## `while` [while] -Loop while a KQL condition evaluates to true. The `max-iterations` field caps the number of iterations and defaults to **2000**. The default `on-limit` behavior is `continue`, which means the step succeeds quietly when the cap is reached. To fail the workflow on the cap instead, use the object form with `on-limit: fail`. Like `foreach`, `while` also supports loop-level `timeout`, `on-failure`, and `if`, plus per-iteration `iteration-timeout` and `iteration-on-failure`. +Loop while a KQL condition evaluates to true. The `max-iterations` field caps the number of iterations and defaults to **2000**. The default `on-limit` behavior is `continue`, which means the step succeeds quietly when the cap is reached. To fail the workflow on the cap instead, use the object form with `on-limit: fail`. Like `foreach`, it also supports loop-level guardrails and per-iteration controls. ```yaml - name: poll_until_ready diff --git a/explore-analyze/workflows/steps/foreach.md b/explore-analyze/workflows/steps/foreach.md index c76ec8d39a..4215d6c29a 100644 --- a/explore-analyze/workflows/steps/foreach.md +++ b/explore-analyze/workflows/steps/foreach.md @@ -48,45 +48,6 @@ steps: Inside the loop, the current item is always available as `foreach.item`. You cannot customize this variable name. :::: -## Guardrails - -```{applies_to} -stack: ga 9.4+ -serverless: ga -``` - -Use loop-level guardrails to control the `foreach` step as a whole: - -* `max-iterations` prevents unexpectedly large collections from creating runaway loops. A bare number is treated as `{ limit: N, on-limit: continue }`, so the loop succeeds when the limit is reached. Use the object form with `on-limit: fail` when reaching the cap should fail the workflow. -* `timeout` limits the total time spent in the loop, across all iterations. -* `on-failure` defines loop-level failure handling with the same `continue`, `retry`, and `fallback` options used by other steps. -* `if` skips the entire loop when the condition evaluates to false. - -Use iteration-level guardrails to control each pass through the loop: - -* `iteration-timeout` limits how long one iteration can run. -* `iteration-on-failure` handles failures for one iteration with `continue`, `retry`, or `fallback` without failing the whole loop. - -```yaml -steps: - - name: processAlerts - type: foreach - foreach: "${{ event.alerts }}" - if: "inputs.process_alerts : true" - max-iterations: - limit: 100 - on-limit: fail - timeout: "10m" - iteration-timeout: "30s" - iteration-on-failure: - continue: true - steps: - - name: logAlert - type: console - with: - message: "Processing alert {{ foreach.item._id }}" -``` - The `foreach` field supports the following expression types: * [Template expressions](#template-expressions) @@ -167,6 +128,44 @@ Template expressions support bracket notation for keys that contain dots or othe "{{ foreach.item['service.name'] }}" ``` +## Guardrails + +```{applies_to} +stack: ga 9.4+ +serverless: ga +``` + +Use loop-level guardrails to control the `foreach` step as a whole: + +* `max-iterations` caps how many items the loop processes and defaults to **2000** with `on-limit: continue`, so a larger collection is silently truncated unless you raise the limit or set `on-limit: fail`. A bare number is shorthand for `{ limit: N, on-limit: continue }`; use the object form with `on-limit: fail` to fail the workflow when the cap is reached. +* `timeout` limits the total time spent in the loop, across all iterations. +* `on-failure` defines loop-level failure handling with the same `continue`, `retry`, and `fallback` options used by other steps. +* `if` skips the entire loop when the condition evaluates to false. + +Use iteration-level guardrails to control each pass through the loop: + +* `iteration-timeout` limits how long one iteration can run. +* `iteration-on-failure` handles failures for one iteration with `continue`, `retry`, or `fallback` without failing the whole loop. + +```yaml +steps: + - name: processAlerts + type: foreach + foreach: "${{ event.alerts }}" + if: "inputs.process_alerts : true" + max-iterations: + limit: 100 + on-limit: fail + timeout: "10m" + iteration-timeout: "30s" + iteration-on-failure: + continue: true + steps: + - name: logAlert + type: console + with: + message: "Processing alert {{ foreach.item._id }}" +``` ## Example: Process search results diff --git a/explore-analyze/workflows/steps/while.md b/explore-analyze/workflows/steps/while.md index ba64f9aca8..c9d6fbe946 100644 --- a/explore-analyze/workflows/steps/while.md +++ b/explore-analyze/workflows/steps/while.md @@ -45,7 +45,7 @@ Use `while` for polling patterns: checking a status until it reaches `ready`, re Use loop-level guardrails to control the `while` step as a whole: -* `max-iterations` prevents runaway loops. The default is **2000** iterations. A bare number is treated as `{ limit: N, on-limit: continue }`, so the loop succeeds when the limit is reached. Use the object form with `on-limit: fail` when reaching the cap should fail the workflow. +* `max-iterations` caps the number of iterations to prevent runaway loops. * `timeout` limits the total time spent in the loop, across all iterations. * `on-failure` defines loop-level failure handling with the same `continue`, `retry`, and `fallback` options used by other steps. * `if` skips the entire loop when the condition evaluates to false.