Skip to content

[BUG] Workflow Validation Fails for Detectors with More Than 10 Rules #2144

@thecodingshrimp

Description

@thecodingshrimp

What is the bug?

Creating a Security Analytics detector through the OpenSearch Create Detector API fails when the detector references enough rules to generate more than 10 delegate monitors in the underlying Alerting workflow. The request fails with a 400 Bad Request and reports that some monitor IDs are invalid, even though those monitor IDs were created correctly.

Example error:

Error 400 (Bad Request): 0uhbOp4BCrnc_R8wysGB, 1-hbOp4BCrnc_R8wysGD, 2uhbOp4BCrnc_R8wysGG are not valid monitor ids [type=security_analytics_exception]

The root cause is in Alerting workflow validation, not in Security Analytics monitor ID generation.

How can one reproduce the bug?

Steps to reproduce the behavior:

  1. Start an OpenSearch cluster with the Security Analytics and Alerting plugins enabled.
  2. Ensure you have more than 11 Security Analytics rule IDs available for a detector input.
  3. Send a detector creation request to the Create Detector API:
    POST _plugins/_security_analytics/detectors
  4. Use a request body that includes enough pre_packaged_rules and/or custom_rules to produce more than 10 delegate monitors in the generated workflow.
  5. Observe that detector creation fails with:
    Error 400 (Bad Request): <monitor_ids> are not valid monitor ids [type=security_analytics_exception]
    

Example reproduction request shape:

POST _plugins/_security_analytics/detectors
{
  "enabled": true,
  "schedule": {
    "period": {
      "interval": 1,
      "unit": "MINUTES"
    }
  },
  "detector_type": "WINDOWS",
  "type": "detector",
  "inputs": [
    {
      "detector_input": {
        "description": "windows detector for security analytics",
        "indices": ["windows"],
        "pre_packaged_rules": [
          { "id": "rule-id-01" },
          { "id": "rule-id-02" },
          { "id": "rule-id-03" },
          { "id": "rule-id-04" },
          { "id": "rule-id-05" },
          { "id": "rule-id-06" },
          { "id": "rule-id-07" },
          { "id": "rule-id-08" },
          { "id": "rule-id-09" },
          { "id": "rule-id-10" },
          { "id": "rule-id-11" },
          { "id": "rule-id-12" }
        ]
      }
    }
  ],
  "triggers": [
    {
      "ids": [
        "rule-id-01"
      ],
      "types": [],
      "tags": [],
      "severity": "1",
      "actions": [],
      "id": "test-trigger-id",
      "sev_levels": [],
      "name": "test-trigger"
    }
  ],
  "name": "detector-with-many-rules"
}

What is the expected behavior?

The detector should be created successfully, and all generated monitor IDs should be validated correctly, even when the detector contains enough rules to create more than 10 workflow delegates.

What is your host/environment?

  • OS: macOS
  • Version: Tahoe
  • Plugins:
    • opensearch-alerting
    • opensearch-security-analytics

Do you have any screenshots?

No screenshots. The failure is returned directly by the detector creation API as a JSON error response.

Do you have any additional context?

This is an Alerting bug in workflow validation.

Root cause:

  • File: alerting/alerting/src/main/kotlin/org/opensearch/alerting/transport/TransportIndexWorkflowAction.kt
  • Function: validateMonitorAccess()

The validation code searches for workflow delegate monitor IDs using a termsQuery("_id", monitorIds) but does not set the search size explicitly. Because OpenSearch defaults search size to 10, only the first 10 monitor documents are returned. When more than 10 delegate monitors exist, the remaining valid monitor IDs are missing from the search results and are incorrectly reported as invalid.

Current logic (line 716):

val searchSource = SearchSourceBuilder().query(query)

Fixed logic:

val searchSource = SearchSourceBuilder().query(query).size(monitorIds.size)

This ensures that all delegate monitor IDs requested for validation are returned and checked.

Impact:

  • Detector creation fails for sufficiently large rule sets.
  • The failure misleadingly suggests invalid monitor IDs.
  • The bug is in Alerting workflow validation, not in Security Analytics detector creation logic.

Note: Remember to sanitize/redact any sensitive fields or values

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions