diff --git a/solutions/observability/streams-new/configure-retention.md b/solutions/observability/streams-new/configure-retention.md new file mode 100644 index 0000000000..66645bcee9 --- /dev/null +++ b/solutions/observability/streams-new/configure-retention.md @@ -0,0 +1,174 @@ +--- +applies_to: + serverless: ga + stack: preview =9.1, ga 9.2+ +description: Learn how to configure data retention policies for your streams. +products: + - id: observability + - id: elasticsearch + - id: kibana + - id: cloud-serverless + - id: cloud-hosted + - id: cloud-enterprise + - id: cloud-kubernetes + - id: elastic-stack +--- + +# Configure retention [streams-configure-retention] + +Managing data retention across multiple indexes typically requires configuring several different settings—{{ilm}} ({{ilm-init}}), data stream lifecycle, index templates, and index settings—each in a different place. Streams replaces this with a single UI so you can efficiently control storage costs and meet regulatory or compliance requirements. + +The Streams **Retention** tab provides a single place to manage lifecycle policies for your streams: + +- **Set retention periods per stream**: Configure how long each stream retains data without touching {{ilm-init}} policies, index templates, or index settings directly. +- **Cascade retention to child streams**: For wired streams, parent retention policies automatically apply to child streams. Override at the child level when a specific stream needs different settings. +- **Monitor storage in one view**: See storage size, ingestion averages, and tier distribution so you can align retention periods with storage costs and compliance requirements. + +## Required permissions [streams-configure-retention-permissions] + +To edit data retention in {{stack}}, you need the following data stream level privileges: + +- `manage_data_stream_lifecycle` +- `manage_ilm` + +For more information, refer to [Granting privileges for data streams and aliases](../../../deploy-manage/users-roles/cluster-or-deployment-auth/granting-privileges-for-data-streams-aliases.md). + +## Configure retention [streams-configure-retention-steps] + +Use the following steps to review your stream's storage footprint, choose a retention method, and apply it. + +:::::::{stepper} + +::::::{step} Review storage and ingestion data + +Select a stream and open its **Retention** tab. Before setting a retention policy, review the following panels to understand your data's footprint: + +- **Storage size**: Total data volume and document count for the stream. +- **Ingestion averages**: Estimated ingestion per day and per month, based on total stream size divided by stream age. +- **Data lifecycle** or **{{ilm-init}} policy data tiers**: The amount of data in each phase (Hot, Warm, Cold, Frozen) so you can see where data is accumulating. +- **Ingestion over time**: A chart of estimated ingestion volume over time to help spot trends or spikes. + +Use this information to decide how long you need to retain data and which retention method best fits your cost and compliance requirements. + +For more information on data retention, refer to [Data stream lifecycle](../../../manage-data/lifecycle/data-stream.md). +:::::: + +::::::{step} Choose and configure a retention method + +Select **Edit retention method** to open the configuration options, then choose one of the following methods: + +:::::{tab-set} + +::::{tab-item} Inherit retention + +The stream uses retention settings from its index template (classic streams) or parent stream (wired streams). No custom period or policy is needed. + +**Classic streams** default to the data stream's existing index template's data retention configuration. When a stream inherits retention settings from an index template, Streams doesn't manage retention. This is useful when onboarding existing data streams and preserving their lifecycle behavior while still benefiting from Streams' visibility and {{monitor-features}}. + +**Wired streams** {applies_to}`serverless: preview` {applies_to}`stack: preview 9.2+` follow a hierarchical structure that supports inheritance. A child stream can inherit the lifecycle of its nearest ancestor that has a set {{ilm-init}} or retention period policy. When the ancestor's lifecycle is updated, Streams cascades the change to all child streams that inherit it. + +To enable inheritance, turn on **Inherit from index template** or **parent stream** in the **Edit retention method** options. +:::: + +::::{tab-item} Set a retention period + +Set the minimum number of days after which data is deleted. Data stays in the hot phase for best indexing and search performance. + +To set a specific retention period: + +1. Select **Edit retention method**. +1. Turn off **Inherit from index template** or **parent stream**, if enabled. +1. Select **Custom period**. +1. Set the number of days you want to retain data. + +To define a global default retention policy for serverless projects, refer to [project settings](../../../deploy-manage/deploy/elastic-cloud/project-settings.md). +:::: + +::::{tab-item} Follow an {{ilm-init}} policy + +```{applies_to} +serverless: unavailable +stack: preview =9.1, ga 9.2+ +``` + +Select an existing {{ilm-init}} policy to automate how data moves through phases (hot, warm, cold) as it ages. {{ilm-init}} policies let you standardize data retention across Streams and other data streams. + +To follow an existing policy: + +1. Select **Edit retention method**. +1. Turn off **Inherit from index template** or **parent stream**, if enabled. +1. Select **{{ilm-init}} policy**, then choose a pre-defined policy from the list. + +If the policy you need doesn't exist, refer to [Configure a lifecycle policy](../../../manage-data/lifecycle/index-lifecycle-management/configure-lifecycle-policy.md) to create one. +:::: + +::::: + +:::::: + +::::::{step} Configure data lifecycle phases + +```{applies_to} +stack: ga 9.4+ +``` + +When a stream follows an {{ilm-init}} policy, the **Data lifecycle** panel shows the phases defined in that policy as a visual bar. You can edit existing phases or add new ones directly from the **Retention** tab: + +- To edit an existing phase, select the phase in the **Data lifecycle** panel and click {icon}`pencil`. +- To add a phase, click **Add data phase** and select a phase. + +This opens the **Edit data phases** window where you can configure or update your phases. The following phases are available: + +**Hot** +: The index is actively updated and queried. This is the default phase for all data. Options include enabling read-only access and [downsampling](#streams-configure-retention-downsampling). + +**Warm** +: The index is updated infrequently but still queried. Set the minimum age for data to move into this phase. Options include enabling read-only access and [downsampling](#streams-configure-retention-downsampling). + +**Cold** +: The index is rarely updated or queried, and slower query performance is acceptable. Set the minimum age for data to move into this phase. Options include enabling read-only access, [downsampling](#streams-configure-retention-downsampling), and [{{search-snaps}}](#streams-configure-retention-searchable-snapshots). + +**Frozen** +: The index is no longer updated and is queried rarely. Optimized for long-term retention at the lowest possible cost. Set the minimum age for data to move into this phase and configure a snapshot repository. The frozen phase requires a snapshot repository. + +**Delete** +: Remove the index after a specified period of time. Set how long data is stored before deletion and optionally delete any associated [{{search-snaps}}](#streams-configure-retention-searchable-snapshots). + +For more information on {{ilm-init}} phases and available actions, refer to [Index lifecycle](../../../manage-data/lifecycle/index-lifecycle-management/index-lifecycle.md). + +### Downsampling [streams-configure-retention-downsampling] + +Downsampling reduces storage for time series data by replacing original metrics with statistical summaries at a higher sampling interval. For example, metrics sampled every 10 seconds can be consolidated into hourly data points as the data ages, significantly reducing storage while keeping the data queryable. + +Downsampling is available in the Hot, Warm, and Cold phases and only applies to time series data streams. + +For more information, refer to [Downsampling concepts](../../../manage-data/data-store/data-streams/downsampling-concepts.md). + +### {{search-snaps-cap}} [streams-configure-retention-searchable-snapshots] + +{{search-snaps-cap}} let you search infrequently accessed, read-only data directly from a snapshot repository without needing replica shards, significantly reducing storage costs. They are best suited for archival or historical data that requires infrequent access. + +{{search-snaps-cap}} are available in the Cold and Frozen phases. + +For more information, refer to [Searchable snapshots](../../../deploy-manage/tools/snapshot-and-restore/searchable-snapshots.md). +:::::: + +::::::{step} Apply retention to child streams + +```{applies_to} +serverless: preview +stack: preview 9.2+ +``` + +For wired streams, retention policies cascade automatically from parent to child streams. When you update a parent stream's retention policy, Streams propagates the change to all child streams that inherit from it. + +To override retention for a specific child stream, open that stream's **Retention** tab and configure a different method. The child stream uses its own policy instead of inheriting from the parent. +:::::: + +::::::: + +## Set failure store retention [streams-configure-failure-store-retention] + +A [failure store](../../../manage-data/data-store/data-streams/failure-store.md) is a secondary set of indices inside a data stream, dedicated to storing failed documents. + +You can enable and configure failure store retention directly from the **Retention** tab. Select **Enable failure store** to turn it on and set the retention period for failed documents. diff --git a/solutions/observability/streams-new/get-data-in.md b/solutions/observability/streams-new/get-data-in.md new file mode 100644 index 0000000000..6a112ffddd --- /dev/null +++ b/solutions/observability/streams-new/get-data-in.md @@ -0,0 +1,206 @@ +--- +applies_to: + serverless: ga + stack: preview =9.1, ga 9.2+ +description: Learn how to get data into Streams. +products: + - id: observability + - id: elasticsearch + - id: kibana + - id: cloud-serverless + - id: cloud-hosted + - id: cloud-enterprise + - id: cloud-kubernetes + - id: elastic-stack +--- + +# Get data in + +Streams supports two entry points depending on where your data is today: + +- **[Ingest new data](#get-data-in-wired)**: Send logs to a managed endpoint for new ingestion. Best for new deployments, custom logs, and mixed-format sources. +- **[Work with existing data](#get-data-in-classic)**: Work with data already flowing into {{es}}. No migration or configuration changes required. + +## Prerequisites [get-data-in-prerequisites] + +Streams requires the following permissions: + +::::{applies-switch} + +:::{applies-item} serverless: +Streams requires these {{serverless-full}} roles: + +- Admin: Ability to manage all Streams +- Editor/Viewer: Limited access, cannot perform all actions + +::: + +:::{applies-item} stack: +To manage all streams, you need the following permissions: + +- **Cluster permissions**: `manage_index_templates`, `manage_ingest_pipelines`, `manage_pipeline`, `read_pipeline` +- **Data stream level permissions**: `read`, `write`, `create`, `manage`, `monitor`, `manage_data_stream_lifecycle`, `read_failure_store`, `manage_failure_store`, `manage_ilm`. + +To view streams, you need the following permissions: +- **Data stream level**: `read`, `view_index_metadata`, `monitor` + +For more information, refer to [Cluster privileges](elasticsearch://reference/elasticsearch/security-privileges.md#privileges-list-cluster) and [Granting privileges for data streams and aliases](../../../deploy-manage/users-roles/cluster-or-deployment-auth/granting-privileges-for-data-streams-aliases.md) + +::: + +:::: + +## Ingest new data [get-data-in-wired] + +```{applies_to} +stack: preview 9.2+ +serverless: preview +``` + +Wired streams send your documents to a managed endpoint, from which you can route data into child streams based on partitioning rules. Child streams automatically inherit mappings, lifecycle settings, and processors from the parent, and configuration changes propagate through the hierarchy. + +To send data to a wired stream, configure your shipper to point to the appropriate endpoint: + +:::::{tab-set} + +::::{tab-item} OpenTelemetry +:::{note} +Set the index based on your {{stack}} version: + +- {applies_to}`stack: preview 9.2-9.3` Set the index to `logs`. Only the `logs` endpoint is available in these versions. +- {applies_to}`serverless: preview` {applies_to}`stack: preview 9.4+` Set the index to `logs.otel` or `logs.ecs`, depending on which endpoint you want to use. +::: + +```yaml +processors: + transform/logs-streams: + log_statements: + - context: resource + statements: + - set(attributes["elasticsearch.index"], "logs.otel") # Set to `logs.otel` or `logs.ecs` (serverless and stack 9.4+), or `logs` (stack 9.2–9.3) +service: + pipelines: + logs: + receivers: [myreceiver] # works with any logs receiver + processors: [transform/logs-streams] + exporters: [elasticsearch, otlp] # works with either +``` +:::: + +::::{tab-item} Filebeat +:::{note} +Set the index based on your {{stack}} version: + +- {applies_to}`stack: preview 9.2-9.3` Set the index to `logs`. Only the `logs` endpoint is available in these versions. +- {applies_to}`serverless: preview` {applies_to}`stack: preview 9.4+` Set the index to `logs.otel` or `logs.ecs`, depending on which endpoint you want to use. +::: + +```yaml +filebeat.inputs: + - type: filestream + id: my-filestream-id + index: logs.otel # Set to `logs.otel` or `logs.ecs` (serverless and stack 9.4+), or logs (stack 9.2–9.3) + enabled: true + paths: + - /var/log/*.log + +# No need to install templates for wired streams +setup: + template: + enabled: false + +output.elasticsearch: + hosts: [""] + api_key: "" +``` +:::: + +::::{tab-item} Logstash +:::{note} +Set the index based on your {{stack}} version: + +- {applies_to}`stack: preview 9.2-9.3` Set the index to `logs`. Only the `logs` endpoint is available in these versions. +- {applies_to}`serverless: preview` {applies_to}`stack: preview 9.4+` Set the index to `logs.otel` or `logs.ecs`, depending on which endpoint you want to use. +::: + +```json +output { + elasticsearch { + hosts => [""] + api_key => "" + index => "logs.otel" # Set to `logs.otel` or `logs.ecs` (serverless and stack 9.4+), or `logs` (stack 9.2–9.3) + action => "create" + } +} +``` +:::: + +::::{tab-item} Fleet +Use the **Custom Logs (Filestream)** integration to send data to wired streams: + +1. Find **{{fleet}}** in the navigation menu or use the [global search field](../../../explore-analyze/find-and-organize/find-apps-and-objects.md). +1. Select the **Settings** tab. +1. Under **Outputs**, find the output you want to use and select the {icon}`pencil` icon. +1. Turn on **Write to logs streams**. +1. Add the **Custom Logs (Filestream)** integration to an agent policy. +1. Enable the **Use the "logs" data stream** setting under **Change defaults**. +1. Under **Where to add this integration**, select an agent policy that uses the output configured in step 4. +:::: + +::::{tab-item} API +:::{note} +Set the endpoint based on your {{stack}} version: + +- {applies_to}`stack: preview 9.2-9.3` Set the endpoint to `logs`. Only the `logs` endpoint is available in these versions. +- {applies_to}`serverless: preview` {applies_to}`stack: preview 9.4+` Set the endpoint to `logs.otel` or `logs.ecs`, depending on which endpoint you want to use. +::: + +Send data to the endpoint using the [Bulk API]({{es-apis}}operation/operation-bulk): + +```json +POST /logs.otel/_bulk # Set to `logs.otel` or `logs.ecs` (serverless or stack 9.4+), or `logs` (stack 9.2–9.3) +{ "create": {} } +{ "@timestamp": "2025-05-05T12:12:12", "body": { "text": "Hello world!" }, "resource": { "attributes": { "host.name": "my-host-name" } } } +{ "create": {} } +{ "@timestamp": "2025-05-05T12:12:12", "message": "Hello world!", "host.name": "my-host-name" } +``` +:::: + +::::: + +## Work with existing data [get-data-in-classic] + +Use classic streams when you want the ease of extracting fields and configuring data retention while working with data that's already being ingested into {{es}}. + +Classic streams: + +- Are based on existing data streams, index templates, and component templates. +- Can follow the data retention policy set in the existing index template. +- Do not support hierarchical inheritance or cascading configuration updates. + +No additional configuration is required. Open Streams from {{kib}} and your existing data streams appear automatically. + +:::::{stepper} + +::::{step} Open Streams +Open Streams from the following places in {{kib}}: + +- Select **Streams** from the navigation menu or use the [global search field](../../../explore-analyze/find-and-organize/find-apps-and-objects.md). + +- Open the data stream for a specific document from **Discover**. To do this, expand the details flyout for a document stored in a data stream, and select **Stream** or an action associated with the document's data stream. Streams then opens filtered to the selected data stream. + +You can also access Streams features using the Streams API. {applies_to}`stack: preview 9.1+` {applies_to}`serverless: preview` Refer to the [Streams API documentation]({{kib-apis}}group/endpoint-streams) for more information. +:::: + +::::{step} Verify data is flowing +After configuring your data source, confirm data is appearing in Discover. + +For wired streams, you first need to make the index pattern available: + +1. Manually [create a {{data-source}}](../../../explore-analyze/find-and-organize/data-views.md#settings-create-pattern) for the wired streams index pattern (`logs,logs.*`). +1. Add the wired streams index pattern (`logs,logs.*`) to the `observability:logSources` {{kib}} advanced setting, which you can open from the navigation menu or by using the [global search field](../../../explore-analyze/find-and-organize/find-apps-and-objects.md). + +Once data appears in Discover, you're ready to start organizing, parsing, and configuring retention for your streams. +:::: + +::::: diff --git a/solutions/observability/streams-new/manage-data-quality.md b/solutions/observability/streams-new/manage-data-quality.md new file mode 100644 index 0000000000..72d95746c7 --- /dev/null +++ b/solutions/observability/streams-new/manage-data-quality.md @@ -0,0 +1,104 @@ +--- +navigation_title: Manage data quality +applies_to: + serverless: preview + stack: preview =9.1, ga 9.2+ +description: Monitor Streams data quality by tracking degraded documents, failed documents, quality scores, and ingestion trends over time. +products: + - id: observability + - id: elasticsearch + - id: kibana + - id: cloud-serverless + - id: cloud-hosted + - id: cloud-enterprise + - id: cloud-kubernetes + - id: elastic-stack +--- + +# Manage your data quality with Streams [streams-data-quality] + +When documents fail during ingestion, you can lose data without knowing what went wrong. Streams preserves failed documents and gives you the tools to find, understand, and fix data quality issues without leaving the Streams UI. + +The Streams **Data quality** tab provides a single place to monitor and resolve data quality issues: + +- **Failed documents are preserved, not dropped**: When a processing error occurs, data lands in a [failure store](#streams-data-quality-failure) instead of being lost, so nothing is silently discarded. +- **No separate infrastructure needed**. +- **See exactly what is failing and why**: The **Data quality** tab shows failure counts, error types, and sample messages, so you can identify problems immediately rather than searching for them. +- **Fix issues against the actual failing documents**: Instead of re-ingesting data from the source, you iterate on the processor using the exact documents that failed, with real-time validation before deploying the fix. + +## Find and fix data quality issues [streams-data-quality-workflow] + +Use the following steps to use Streams to identify failing documents, trace the source of failure, and deploy a fix. + +:::::::{stepper} + +::::::{step} Check the quality score + +From the main **Streams** page, the **Data quality** column shows the health of each stream at a glance: **Good**, **Degraded**, or **Poor**. Filter by quality status to see streams that need attention, then click the quality score for a stream to open its **Data quality** tab. + +:::{dropdown} How is the quality score calculated? + +Streams uses the following {{esql}} queries to calculate the quality score for a stream: + +- All documents (including failed documents): `FROM , ::failures | STATS doc_count = COUNT(*)` +- Failed documents only: `FROM ::failures | STATS failed_doc_count = COUNT(*)` +- Degraded documents: `FROM METADATA _ignored | WHERE _ignored IS NOT NULL | STATS degraded_doc_count = COUNT(*)` + +Streams calculates data quality as follows: + +- **Good:** Both the **Degraded documents** percentage and the **Failed documents** percentage are 0. +- **Degraded:** Either the **Degraded documents** percentage or the **Failed documents** percentage are greater than 0 and less than or equal to 3. +- **Poor:** Either the **Degraded documents** percentage or the **Failed documents** percentage are greater than 3. +::: + +:::::: + +::::::{step} Review failures and trends + +The **Data quality** tab shows a breakdown of what's wrong and when it started: + +- **Failed documents**: Documents rejected during ingestion because of mapping conflicts or pipeline failures. Failed documents land in the [failure store](#streams-data-quality-failure) rather than being dropped. +- **Degraded documents**: Documents that were ingested but with fields silently ignored because of mapping issues or values that exceeded configured limits. +- **Quality score**: An overall health rating based on the percentage of failed and degraded documents. +- **Trends over time**: A time-series chart showing when the problem started and whether it's getting worse. + +Use the chart to understand the scope of the issue before drilling into individual documents. +:::::: + +::::::{step} Enable the failure store + +To inspect failed documents, you need to turn on the failure store. If the failure store is off, the **Failed documents** component shows **Enable failure store**, select it and set a retention period. + +Once on, Streams captures all rejected documents in a `::failures` index. Each failed document includes the error message from the processor that caused the rejection, so you can trace the failure back to its source. + +Refer to [Failure store](#streams-data-quality-failure) for permission requirements and setup details. +:::::: + +::::::{step} Fix and validate + +Navigate to the **Processing** tab to edit the failing processor. The **Data preview** pane lets you test your changes against the actual documents that failed—so you can confirm the fix works against real data before deploying it. + +When the preview looks right, select **Save changes**. Streams applies the updated pipeline to all future incoming documents. + +Refer to [Process your documents](./parse-and-process.md) for detailed guidance on editing processors and using the data preview. +:::::: + +::::::: + +## Failure store [streams-data-quality-failure] + +A [failure store](../../../manage-data/data-store/data-streams/failure-store.md) is a secondary set of indices inside a data stream, dedicated to storing failed documents. Instead of losing documents that are rejected during ingestion, a failure store retains them in a `::failures` index, so you can review failed documents to understand what went wrong and how to fix it. + +For example, for a stream called `my-stream`, Streams fetches all documents from the `my-stream::failures` index from within the specified time range in the date picker. + +### Required permissions +:::{include} ../../_snippets/failure-store-permissions.md +::: + +## Create a data quality alert [streams-data-quality-alert] + +To get notified when the percentage of degraded documents in a stream exceeds a threshold, create an alert rule from the **Data quality** tab. + +1. Open the **Data quality** tab for the stream you want to monitor. +2. Select **Create rule** ({icon}`bell`). +3. [Define the conditions](../incident-management/create-a-degraded-docs-rule.md#degraded-docs-rule-conditions) for your rule. \ No newline at end of file diff --git a/solutions/observability/streams-new/organize-your-data.md b/solutions/observability/streams-new/organize-your-data.md new file mode 100644 index 0000000000..69e42ce6b4 --- /dev/null +++ b/solutions/observability/streams-new/organize-your-data.md @@ -0,0 +1,93 @@ +--- +applies_to: + serverless: preview + stack: preview 9.2+ +description: Learn how to organize your data streams using routing and partitioning. +products: + - id: observability + - id: elasticsearch + - id: kibana + - id: cloud-serverless + - id: cloud-hosted + - id: cloud-enterprise + - id: cloud-kubernetes + - id: elastic-stack +--- + +# Organize your data + +When logs from multiple sources flow into a single wired stream, partitioning lets you route subsets of that data into dedicated child streams. Each child stream can then be managed independently, with its own retention policy, processing rules, and field mappings, while automatically inheriting the parent's defaults. + +For example, you can route firewall logs to a `logs.otel.firewall` child stream with a 7-day retention, and application logs to a `logs.otel.application` child stream with a 30-day retention, without duplicating any shared configuration. + +:::{note} +Partitioning is only available on [wired streams](./get-data-in.md#get-data-in-wired). If you're using classic streams or all your logs need identical treatment, skip this step. +::: + +## Partitioning recommendations [organize-partitioning-recommendations] + +Before creating partitions, keep the following in mind: + +- **Partition by logical groupings**, not by high-cardinality fields. Group logs by team, technology type, or environment (for example, `web-servers`, `application`, `security`) rather than by individual service names or host identifiers, which can generate too many streams to manage effectively. +- **Aim for tens of partitions, not hundreds.** Each partition creates a dedicated data stream in {{es}}. There is a cost to each one, so keep the number manageable. +- **Only partition when you need different lifecycle policies.** If all your logs can share the same retention and processing rules, a single stream is simpler to operate. + +## Partition your data [organize-partitioning] + +:::::{stepper} + +::::{step} Open the Partitioning tab +1. Open Streams from the navigation menu or use the [global search field](../../../explore-analyze/find-and-organize/find-apps-and-objects.md). +1. Select your wired stream from the list. +1. Go to the **Partitioning** tab. +:::: + +::::{step} Create a partition +Choose how to define partitions: manually using field-based conditions, or by letting AI analyze your data and suggest groupings. + +:::{dropdown} Create partitions manually +1. Select **Create partition manually**. +1. In the **Data preview**, hover over a field and select: + - {icon}`plus_circle` to route data where the field matches the value. + - {icon}`minus_circle` to route data where the field does not match the value. +1. Under **Stream name**, give the child stream a name that reflects the condition. +1. Select **Save** to create the child stream. + +Under **Condition**, you can also set the field, comparator, and value directly. Turn on the **Syntax editor** to enter conditions in YAML. For more on conditions, refer to [Streamlang conditions](../streams/management/streamlang.md#streams-streamlang-conditions). +::: + +:::{dropdown} Suggest partitions with AI +**Requires a [Generative AI connector](kibana://reference/connectors-kibana/gen-ai-connectors.md).** + +{applies_to}`serverless: preview` {applies_to}`stack: preview 9.4+` + +1. Select **Suggest partitions with AI**. Streams analyzes your data and suggests groupings. +1. Review the suggested partitions, then **Accept** or **Reject** each one. +1. To refine the results, select **Modify suggestions**, provide guidance (for example, "Partition by service name and severity level"), and submit. Streams regenerates suggestions based on your input. +1. Continue refining as needed, or select **Try again** to start over. +1. After accepting, review the generated **Stream name** and **Condition**. +1. Select **Create stream**. + +{applies_to}`stack: preview 9.2-9.3` + +1. Select **Suggest partitions with AI**. Streams analyzes your data and suggests groupings. +1. **Accept** or **Reject** the suggestions. After accepting, review the **Stream name** and **Condition**. +1. Select **Create stream**. +::: +:::: + +::::{step} Review the stream hierarchy +After saving, your stream list updates to show the parent-child relationship. For example: + +``` +logs +├── logs.otel.application [30d retention] +└── logs.otel.firewall [7d retention] +``` + +Child streams automatically inherit the parent's field mappings, lifecycle settings, and processors. You can override any inherited setting at the child level without affecting the parent or other children. +:::: + +::::: + +After partitioning, each child stream can be configured independently. You're ready to add [processing rules](./parse-and-process.md) to extract fields, set [retention policies](./configure-retention.md) per stream, or monitor [data quality](./manage-data-quality.md) for individual streams. diff --git a/solutions/observability/streams-new/parse-and-process.md b/solutions/observability/streams-new/parse-and-process.md new file mode 100644 index 0000000000..1ff3bdf2fe --- /dev/null +++ b/solutions/observability/streams-new/parse-and-process.md @@ -0,0 +1,242 @@ +--- +navigation_title: Process documents +applies_to: + serverless: ga + stack: preview =9.1, ga 9.2+ +description: Process and extract fields from incoming Streams documents using configurable processors, conditions, and live simulation previews. +products: + - id: observability + - id: elasticsearch + - id: kibana + - id: cloud-serverless + - id: cloud-hosted + - id: cloud-enterprise + - id: cloud-kubernetes + - id: elastic-stack +--- +# Process your documents with Streams [streams-extract-fields] + +Most log data arrives as unstructured text. To filter, search, and analyze it effectively, you need to extract fields from that raw content. For example, extracted fields let you filter for log messages with an `ERROR` log level that occurred during a specific time period to help diagnose an issue. + +The Streams **Processing** tab provides a single place to build and manage your document processing pipeline: + +- **[Add processors and conditions](#streams-add-processors)**: Use the Streams UI without needing to manually configure pipeline JSON or Grok syntax. +- {applies_to}`serverless: preview` {applies_to}`stack: preview 9.3+` **[Generate pipeline suggestions using AI](#streams-generate-pipeline-suggestions)**: Let Streams analyze sample documents and suggests pipeline patterns, so you're refining instead of writing from scratch. +- **[Preview changes](#streams-preview-changes)**: Use the data preview to view which fields your pattern extracts per document, so you can verify field extraction before saving. +- **[Detect and resolve processing issues](#streams-detect-failures)**: Identify which processor or condition is causing documents to fail during processing. +- **[Catch mapping conflicts](#streams-processing-mapping-conflicts)**: Identify potential mapping conflicts before they cause cluster-wide failures. Streams simulates the indexing process end-to-end before deploying. + +## Add and configure processors [streams-add-processors] + +Streams uses [{{es}} ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) made up of processors and conditions to transform your data, without requiring you to switch interfaces and manually update pipelines. + +To add a processor from the **Processing** tab: + +:::::::{stepper} +::::::{step} Add processors and conditions +:anchor: streams-generate-pipeline-suggestions + +Use any combination of the following options to build your processing pipeline: + +- **Suggest a pipeline**: Let Streams analyze your sample data and generate a complete processor pipeline using AI. +- **Manually add processors**: Select and configure individual processors yourself when you know which transformations you need. +- **Add conditions**: Attach Boolean expressions to define when to run processors. + +:::::{tab-set} + +::::{tab-item} Suggest a pipeline + +```{applies_to} +stack: preview 9.3+ +serverless: preview +``` + +:::{note} +This feature requires a [Generative AI connector](kibana://reference/connectors-kibana/gen-ai-connectors.md). +::: + +Setting up processors is generally a multi-step process. For example, you might need a grok processor to extract fields, a date processor to convert timestamps, and a remove processor to get rid of temporary fields. Instead of creating individual processors manually, you can have AI suggest an entire pipeline for you: + +1. From the **Processing** tab, select **Suggest a pipeline**. +1. Review the suggested processors, and either **Accept** or **Reject** the suggestions. +1. Select **Regenerate** to have Streams regenerate the suggested pipeline. Change the large language model (LLM) that Streams uses to generate suggestions from the {icon}`controls` menu. + +:::{dropdown} How does Suggest a pipeline work? +:::{include} ../../_snippets/streams-suggestions.md +::: +::: +:::: + +::::{tab-item} Manually add processors + +If you know which processors you want to use, you can add them manually from the Streams UI. Refer to the [Streamlang reference](../streams/management/streamlang.md) for supported processors and configuration details. + +1. Select **Create processor**. You can also let Streams suggest processors by selecting [Suggest a pipeline](#streams-generate-pipeline-suggestions). +1. Select a processor from the **Processor** menu. + + :::{note} + Let Streams suggest patterns for [Grok](../streams/management/extract/grok.md#streams-grok-patterns) and [dissect](../streams/management/extract/dissect.md#streams-dissect-patterns) processors by selecting **Generate pattern**. This feature requires a [Generative AI connector](kibana://reference/connectors-kibana/gen-ai-connectors.md). + ::: + +1. Configure the processor and select **Create** to save the processor. +1. Optional: Turn on **Ignore failures** if you want document processing to continue even when this processor fails. +1. Optional: For dissect, Grok, and rename processors, enable **Ignore missing fields** if you want processing to continue when a source field is missing. +:::: + +::::{tab-item} Add conditions + +You can add conditions—Boolean expressions evaluated for each document—and attach processors that run only when those conditions are met. + +To add a condition: + +1. Select **Create** → **Create condition**. +1. Provide a **Field**, a **Value**, and a comparator. +1. Select **Create condition**. +1. After creating a condition, add a processor or another condition to it by selecting the {icon}`plus_in_circle`. + +Refer to [Conditions](../streams/management/streamlang.md#streams-streamlang-conditions) in the Streamlang reference for supported operators and examples. +:::: + +::::: +:::::: + +::::::{step} Preview changes +:anchor: streams-preview-changes + +After adding processors and conditions, the **Data preview** tab simulates processor results with additional filtering options depending on the outcome of the simulation. + +When you add or edit processors, the **Data preview** tab updates automatically. + +The **Data preview** tab loads 100 documents from your existing data and runs your changes against them. +For any newly created processors and conditions, the preview results are reliable, and you can freely create and reorder during the preview. + +If you edit the stream after previewing your changes, keep the following in mind: + +- Adding processors to the end of the list works as expected. +- Editing or reordering existing processors can cause inaccurate results. Because the pipeline might have already processed the documents used for sampling, **Data preview** cannot accurately simulate changes to existing data. +:::::: + +::::::{step} Detect and resolve failures and mapping conflicts +:anchor: streams-detect-failures + +Streams helps you catch issues before you apply your processors: + +- **Failures**: A processor couldn't parse or transform a document, usually due to a mismatched pattern or missing field. +- **Mapping conflicts**: A processor produced a field type that conflicts with the existing index mapping. + +### Detect and resolve failures [streams-detect-failures-section] + +Documents can fail processing for various reasons. Streams helps you identify and resolve these issues before deploying changes. + +In the following screenshot, the **Failed** percentage indicates that some messages didn't match the provided grok pattern: + +:::{image} ../../../images/logs-streams-parsed.png +:screenshot: +::: + +You can filter your documents by selecting **Parsed** or **Failed** on the **Data preview** tab. +Selecting **Failed** shows the documents that weren't parsed correctly: + +:::{image} ../../../images/logs-streams-failures.png +:screenshot: +::: + +Streams displays failures at the bottom of the process editor. Some failures might require fixes, while others serve as a warning: + +:::{image} ../../../images/logs-streams-processor-failures.png +:screenshot: +::: + +### Detect mapping conflicts [streams-processing-mapping-conflicts] + +As part of processing, Streams also simulates your changes end-to-end to check for mapping conflicts. If it detects a conflict, Streams marks the processor as failed and displays a message like the following: + +:::{image} ../../../images/logs-streams-mapping-conflicts.png +:screenshot: +::: + +Use the information in the failure message to find and troubleshoot the mapping issues. +:::::: + +::::::{step} Save changes +After adding all of your processors and conditions and addressing any issues, select **Save changes**. Streams then parses all future data ingested into the stream according to your pipeline. + +:::{note} +Applied changes aren't retroactive and only affect future ingested data. +::: + +:::::: +::::::: + +## Switch between interactive and YAML editing modes [streams-editing-modes] + +The Streams processing UI provides an [interactive mode](#streams-editing-interactive-mode) and a [YAML mode](#streams-editing-yaml-mode) for editing processors and conditions. + +To switch modes, select the appropriate tab from the top of the processing page. + +:::{image} ../../../images/streams-editing-modes.png +:screenshot: +::: + +Streams defaults to interactive mode unless the configuration can't be represented in interactive mode (for example, when nesting levels are too deep). + +### When to use Interactive mode [streams-editing-interactive-mode] + +**Interactive** mode provides a form-based interface for creating and editing processors. This mode works best for: + +- Users who prefer a guided, visual approach +- Configurations that don't require deeply nested conditions + +### When to use YAML mode [streams-editing-yaml-mode] +```{applies_to} +stack: ga 9.3+ +``` + +**YAML** mode provides a code editor for writing Streamlang directly. This mode works best for: + +- Users who prefer working with code +- Advanced configurations with complex or deeply nested conditions + +Refer to the [Streamlang reference](../streams/management/streamlang.md) for the complete syntax, condition operators, and examples. + +## Known limitations [streams-known-limitations] + +- Streams does not support all processors. Refer to the [Streamlang reference](../streams/management/streamlang.md) for supported processors. +- The data preview simulation might not accurately reflect the changes to the existing data when editing existing processors or re-ordering them. +- Streams can't properly handle arrays. While it supports basic actions like appending or renaming, it can't access individual array elements. For classic streams, the workaround is to use the [manual pipeline configuration](../streams/management/extract/manual-pipeline-configuration.md) that supports {{product.painless}} scripting and all ingest processors. + +## Advanced: How Streams applies processing changes to the underlying data stream [streams-applied-changes] + +::::{dropdown} How Streams applies processing changes + +When you save processors, Streams appends processing to the best-matching ingest pipeline for the data stream. It either chooses the best-matching pipeline ending in `@custom` in your data stream, or it adds one for you. + +Streams identifies the appropriate `@custom` pipeline (for example, `logs-myintegration@custom` or `logs@custom`) by checking the `default_pipeline` that is set on the data stream. You can view the default pipeline on the **Advanced** tab under **Ingest pipeline**. + +In this default pipeline, Streams locates the last processor that calls a pipeline ending in `@custom`. +- For integrations, this would result in a pipeline name like `logs-myintegration@custom`. +- Without an integration, the only `@custom` pipeline available may be `logs@custom`. + +If no default pipeline is detected, Streams adds a default pipeline to the data stream by updating the index templates. + +If a default pipeline is detected, but it does not contain a custom pipeline, Streams adds the pipeline processor directly to the pipeline. + +Streams then adds a pipeline processor to the end of that `@custom` pipeline. This processor definition directs matching documents to a dedicated pipeline managed by Streams called `@stream.processing`: + +```json +// Example processor added to the relevant @custom pipeline +{ + "pipeline": { + "name": "@stream.processing", // for example, logs-my-app-default@stream.processing + "if": "ctx._index == ''", + "ignore_missing_pipeline": true, + "description": "Call the stream's managed pipeline - do not change this manually but instead use the Streams UI or API" + } +} +``` + +Streams then creates and manages the `@stream.processing` pipeline, adding the [processors](#streams-add-processors) you configured in the UI. + +Do not manually modify the `@stream.processing` pipeline created by Streams. +You can still add your own processors manually to the `@custom` pipeline if needed. Adding processors before the pipeline processor created by Streams might cause unexpected behavior. +:::: \ No newline at end of file diff --git a/solutions/observability/streams-new/streams.md b/solutions/observability/streams-new/streams.md new file mode 100644 index 0000000000..893c91a9a5 --- /dev/null +++ b/solutions/observability/streams-new/streams.md @@ -0,0 +1,81 @@ +--- +applies_to: + serverless: ga + stack: preview =9.1, ga 9.2+ +description: Streams provides a centralized UI for extracting fields, setting retention, routing data, and managing Elasticsearch data streams. +products: + - id: observability + - id: elasticsearch + - id: kibana + - id: cloud-serverless + - id: cloud-hosted + - id: cloud-enterprise + - id: cloud-kubernetes + - id: elastic-stack +--- + +# Streams-new + +Streams allows you to automatically parse, structure, and organize your log data so you can query it immediately, without writing Grok expressions or maintaining custom pipelines. + +When an incident hits, Streams gets you to answers faster. AI-powered detection continuously scans your logs for critical signals and surfaces what matters. Instead of manually scanning thousands of log lines, you get a prioritized list of what matters. + +## Use Streams to... + +**Organize logs automatically** +: Streams uses AI to partition your log data by source and component, without manual regex rules or pipeline configuration. As new log formats arrive, Streams continues to learn and extend its partitioning automatically. + +**Get meaning from logs** +: The AI-powered processing pipeline detects log formats and generates parsing rules that extract structured fields from unstructured text. You get clean, queryable data without writing a single GROK expression. + +**Solve incidents in minutes, not hours** +: Significant Events detection continuously scans your streams for critical signals: out-of-memory errors, crash loops, certificate expirations, and anomalies. + +**Reduce time spent on managing pipelines** +: Streams uses AI to simplify parsing, enrichment, partitioning, and schema updates. You can start investigating issues within minutes, rather than spending weeks on pipeline setup and data engineering. + +**Control storage costs** +: By surfacing the most critical logs and automatically structuring data for efficient storage, Streams allows you to retain high-value data without discarding important information, reducing overall storage costs. + + +## Quick tour + +This is a quick overview of the main steps to get started with Streams in {{kib}}. It covers how to get data in, organize it into streams, parse and enrich your logs, set retention policies, and monitor data quality. + +This tour is an ideal way to familiarize yourself with the Streams UI and its core workflows. You can follow along directly in your {{ecloud}} or self-managed {{es}} environment. + + +:::::{stepper} + +::::{step} Get data in +Send logs via OpenTelemetry, Fluentd, Fluentbit, or an Elastic integration. For agentless ingest, send directly to the `/logs` endpoint. +:::: + +::::{step} Organize your data + +:::{dropdown} From {{kib}} +- Select **Streams** from the navigation menu or use the [global search field](../../../explore-analyze/find-and-organize/find-apps-and-objects.md). +- Open the data stream for a specific document from **Discover**. To do this, expand the details flyout for a document that's stored in a data stream, and select **Stream** or an action associated with the document's data stream. Streams then opens filtered to the selected data stream. +::: + +:::{dropdown} Using the API +{applies_to}`stack: preview 9.1` {applies_to}`serverless: preview` You can also access Streams features using the Streams API. Refer to the [Streams API documentation]({{kib-apis}}group/endpoint-streams) for more information. +::: +:::: + +::::{step} Parse and process +Streams automatically organizes your logs by source and component. Accept, adjust, or add [**partitions**](./organize-your-data.md) manually. Use the [**Processing** tab](./parse-and-process.md) to parse and extract fields from log messages. Accept AI-generated GROK rules or write your own. +:::: + +::::{step} Configure retention +Use the [**Retention** tab](./configure-retention.md) to define how long each stream stores data and to review ingestion volume. +:::: + +::::{step} Manage data quality +Use the [**Data quality** column](./manage-data-quality.md) to filter your streams by data quality status. +:::: + +::::: + +:::::: + diff --git a/solutions/observability/streams-new/use-cases.md b/solutions/observability/streams-new/use-cases.md new file mode 100644 index 0000000000..f2e63a8a2d --- /dev/null +++ b/solutions/observability/streams-new/use-cases.md @@ -0,0 +1,31 @@ +--- +applies_to: + serverless: ga + stack: preview =9.1, ga 9.2+ +description: Explore common use cases for Streams. +products: + - id: observability + - id: elasticsearch + - id: kibana + - id: cloud-serverless + - id: cloud-hosted + - id: cloud-enterprise + - id: cloud-kubernetes + - id: elastic-stack +--- + +# Use cases + +**Incident investigation** + +An SRE receives an alert that a trading application is down. Instead of manually searching through +millions of log lines, they open Streams, where Significant Events has already surfaced a Java +out-of-memory error with the relevant context. In minutes — not hours — they identify the root +cause, escalate to the right team, and restore service. + +**High-volume log management for platform team** + +A platform team ingests logs from dozens of microservices and needs to control costs without losing +context. Using Streams, they set per-stream retention policies, route high-value logs to longer +retention tiers, and use the failure store to catch and investigate parsing errors — all from a +single UI. \ No newline at end of file diff --git a/solutions/observability/streams/streams.md b/solutions/observability/streams/streams.md index 8807ce58a1..ff8a5a8ae4 100644 --- a/solutions/observability/streams/streams.md +++ b/solutions/observability/streams/streams.md @@ -14,7 +14,7 @@ products: - id: elastic-stack --- -# Streams +# Streams-old Streams provides a single, centralized UI within {{kib}} that streamlines common tasks like extracting fields, setting data retention, and routing data, so you don't need to use multiple applications or manually configure underlying {{es}} components. diff --git a/solutions/toc.yml b/solutions/toc.yml index f547d1759b..cc11db779a 100644 --- a/solutions/toc.yml +++ b/solutions/toc.yml @@ -459,6 +459,15 @@ toc: - file: observability/logs/logs-index-template-reference.md children: - file: observability/logs/logs-index-template-defaults.md + - file: observability/streams-new/streams.md + children: + - file: observability/streams-new/get-data-in.md + - file: observability/streams-new/organize-your-data.md + - file: observability/streams-new/parse-and-process.md + - file: observability/streams-new/configure-retention.md + - file: observability/streams-new/manage-data-quality.md + - file: observability/streams-new/use-cases.md + - file: observability/streams/streams.md children: - file: observability/streams/management/retention.md @@ -487,7 +496,7 @@ toc: - file: observability/streams/management/partitioning.md - file: observability/streams/management/schema.md - file: observability/streams/management/data-quality.md - - hidden: observability/streams/management/significant-events.md + - file: observability/streams/management/significant-events.md - file: observability/streams/management/advanced.md - file: observability/streams/management/knowledge-indicators.md - file: observability/streams/wired-streams.md