diff --git a/docs/05-administrator-guide/02-configuration/04-performance/01-placeholder.md b/docs/05-administrator-guide/02-configuration/04-performance/01-placeholder.md index 6f2c1e3d87..db823a9403 100644 --- a/docs/05-administrator-guide/02-configuration/04-performance/01-placeholder.md +++ b/docs/05-administrator-guide/02-configuration/04-performance/01-placeholder.md @@ -67,5 +67,5 @@ A typical production Ozone cluster includes the following services: ### Ozone Configuration - **Monitoring**: Install [Prometheus](../../operations/observability/prometheus) and [Grafana](../../operations/observability/grafana) for monitoring the Ozone cluster. For audit logs, consider using a log ingestion framework such as the ELK Stack (Elasticsearch, Logstash, and Kibana) with FileBeat, or other similar frameworks. Alternatively, you can use Apache Ranger to manage audit logs. -- **Pipeline Limits**: Increase the number of allowed write pipelines to better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit`. See the [Multi-Raft](./multi-raft) documentation for the formula to calculate appropriate pipeline limits based on your metadata disk configuration. +- **Pipeline Limits**: Increase the number of allowed write pipelines to better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit`. See [Calculating Ratis Pipeline Limits](./calculating-ratis-pipeline-limits) for the sizing formula, trade-offs, and production guidance. See [Multi-Raft](./multi-raft) for Datanode metadata directory configuration. - **Heap Sizes**: Configure sufficient heap sizes for Ozone Manager (OM), Storage Container Manager (SCM), Recon, Datanode, S3 Gateway (S3G), and HttpFS services to ensure stability. diff --git a/docs/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md b/docs/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md index 216a0201ed..c6420d47d8 100644 --- a/docs/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md +++ b/docs/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md @@ -2,6 +2,8 @@ sidebar_label: Calculating Ratis Pipeline Limits --- +import RatisPipelineCalculator from '@site/src/components/RatisPipelineCalculator'; + # Calculating Ratis Pipeline Limits ReplicationFactor.THREE is controlled by three configuration properties that limit the @@ -51,10 +53,59 @@ And the configuration is: SCM will attempt to create and maintain approximately **24** open, FACTOR_THREE Ratis pipelines. -**Production Recommendation:** + + +## Sizing Trade-offs + +Pipeline limits balance write throughput against Datanode resource usage. Each open Ratis pipeline is one Ratis +replication group, so the number of pipelines directly affects how much concurrent write capacity the cluster +exposes. + +**Benefits of more pipelines** + +- **Higher write parallelism**: Each pipeline is an independent Ratis group, so more pipelines allow more + concurrent write streams across the cluster. +- **Lower write latency under contention**: With more pipelines, different clients are less likely to share the + same Ratis group, reducing queueing on a single Raft leader. + +**Costs of too many pipelines** + +- **Datanode resource pressure**: Each pipeline consumes CPU, memory, and metadata-disk I/O on the Datanodes + that participate in it. Very high pipeline counts can overload individual nodes. +- **Follower lag and tail latency**: When a Datanode hosts many Raft groups, followers can fall behind on log + replication (index lag). This delays WATCH responses and increases write tail latency for pipelines on that + node. + +### Ozone vs HDFS write pipelines + +Ozone and HDFS take different approaches to grouping Datanodes for replicated writes: + +| | Ozone (Ratis pipelines) | HDFS (stateless pipelines) | +| --- | --- | --- | +| **Pipeline model** | SCM creates and tracks a bounded set of explicit Ratis pipelines | Any three Datanodes can form an ad hoc "pipeline" for three block replicas | +| **Control & visibility** | Centralized limits and monitoring via SCM (`ozone admin pipeline list`, SCM UI) | No equivalent cluster-wide pipeline registry | +| **Load spreading** | Bounded by configured pipeline limits; may cap peak write throughput | Blocks can spread across any DNs, spreading load more evenly | +| **Operational trade-off** | More predictable capacity planning; may sacrifice some peak throughput | Higher potential throughput; harder to observe and cap concurrent write groups | + +![Ozone Ratis pipelines vs HDFS stateless block placement](ratis-vs-hdfs-pipelines.png) + +## Production Recommendation + +For most production deployments, start with the dynamic per-disk limit (`ozone.scm.datanode.pipeline.limit=0`). +This lets pipeline capacity scale naturally with metadata disk resources. A good starting value for +`ozone.scm.pipeline.per.metadata.disk` is **2**. + +Increase pipeline limits only when you observe write contention—for example, many clients sharing the same +pipelines or sustained write queueing—and Datanode resources (CPU, memory, metadata-disk I/O) have headroom. +Use `ozone.scm.ratis.pipeline.limit` as a safety cap rather than the primary tuning knob. + +Monitor the **Pipeline Statistics** section in the SCM web UI, or run `ozone admin pipeline list`, to confirm +the actual number of open pipelines aligns with your configured targets. Watch for follower index lag or +elevated resource usage on Datanodes as signals that pipeline counts may be too high. + +## Related Topics -For most production deployments, using the dynamic per-disk limit (`ozone.scm.datanode.pipeline.limit=0`) is -recommended, as it allows the cluster to scale pipeline capacity naturally with its resources. You can use the -global limit (`ozone.scm.ratis.pipeline.limit`) as a safety cap if needed. A good starting value for -`ozone.scm.pipeline.per.metadata.disk` is **2**. Monitor the section **Pipeline Statistics** in SCM web UI, or run -the command `ozone admin pipeline list` to see if the actual number of pipelines aligns with your configured targets. +Limits on this page apply to **open Ratis pipelines** (ReplicationFactor.THREE). The number of OPEN containers +and containers per pipeline are separate concerns and are not covered here. For background on Multi-Raft and +Datanode metadata directories, see [Multi-Raft](./multi-raft). For EC pipeline sizing, see +[Calculating EC Pipeline Limits](./calculating-ec-pipeline-limits). diff --git a/docs/05-administrator-guide/02-configuration/04-performance/ratis-vs-hdfs-pipelines.png b/docs/05-administrator-guide/02-configuration/04-performance/ratis-vs-hdfs-pipelines.png new file mode 100644 index 0000000000..b37363f24e Binary files /dev/null and b/docs/05-administrator-guide/02-configuration/04-performance/ratis-vs-hdfs-pipelines.png differ diff --git a/src/components/RatisPipelineCalculator/calculateRatisPipelineLimit.js b/src/components/RatisPipelineCalculator/calculateRatisPipelineLimit.js new file mode 100644 index 0000000000..4235140ad7 --- /dev/null +++ b/src/components/RatisPipelineCalculator/calculateRatisPipelineLimit.js @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +const REPLICATION_FACTOR = 3; + +function toPositiveInt(value, fallback = 0) { + const parsed = Number.parseInt(String(value), 10); + if (!Number.isFinite(parsed) || parsed < 0) { + return fallback; + } + return parsed; +} + +/** + * Calculate SCM's target number of open FACTOR_THREE Ratis pipelines. + * + * Mirrors the formulas documented on the Calculating Ratis Pipeline Limits page. + */ +export function calculateRatisPipelineLimit({ + useDynamicLimit, + datanodePipelineLimit, + pipelinesPerMetadataDisk, + globalLimit, + diskGroups, + healthyDatanodes, +}) { + const steps = []; + let rawTarget = 0; + + if (useDynamicLimit) { + const perDisk = Math.max(toPositiveInt(pipelinesPerMetadataDisk, 2), 1); + const groups = Array.isArray(diskGroups) ? diskGroups : []; + let totalSlots = 0; + + groups.forEach((group, index) => { + const datanodeCount = toPositiveInt(group.datanodeCount, 0); + const metadataDisks = Math.max(toPositiveInt(group.metadataDisks, 0), 1); + const groupSlots = datanodeCount * perDisk * metadataDisks; + totalSlots += groupSlots; + + if (datanodeCount > 0) { + steps.push({ + label: `Group ${index + 1} pipeline slots`, + formula: `${datanodeCount} datanodes * (${perDisk} pipelines/disk * ${metadataDisks} disks/datanode)`, + value: groupSlots, + }); + } + }); + + rawTarget = Math.floor(totalSlots / REPLICATION_FACTOR); + steps.push({ + label: 'Raw target from dynamic limit', + formula: `(${totalSlots}) / ${REPLICATION_FACTOR}`, + value: rawTarget, + }); + } else { + const perDatanodeLimit = Math.max(toPositiveInt(datanodePipelineLimit, 2), 1); + const datanodes = toPositiveInt(healthyDatanodes, 0); + const totalSlots = perDatanodeLimit * datanodes; + rawTarget = Math.floor(totalSlots / REPLICATION_FACTOR); + + steps.push({ + label: 'Raw target from fixed datanode limit', + formula: `(${perDatanodeLimit} * ${datanodes} healthy datanodes) / ${REPLICATION_FACTOR}`, + value: rawTarget, + }); + } + + const cap = toPositiveInt(globalLimit, 0); + let finalTarget = rawTarget; + let globalLimitApplied = false; + + if (cap > 0) { + finalTarget = Math.min(rawTarget, cap); + globalLimitApplied = finalTarget < rawTarget; + steps.push({ + label: 'Compare with global limit', + formula: `min(${rawTarget}, ${cap})`, + value: finalTarget, + }); + } + + return { + steps, + rawTarget, + finalTarget, + globalLimitApplied, + }; +} + +export function createDefaultDiskGroups() { + return [ + { id: 'group-1', datanodeCount: 8, metadataDisks: 4 }, + { id: 'group-2', datanodeCount: 2, metadataDisks: 2 }, + ]; +} diff --git a/src/components/RatisPipelineCalculator/index.jsx b/src/components/RatisPipelineCalculator/index.jsx new file mode 100644 index 0000000000..1fb2864b86 --- /dev/null +++ b/src/components/RatisPipelineCalculator/index.jsx @@ -0,0 +1,302 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +import Heading from '@theme/Heading'; +import { useMemo, useState } from 'react'; +import { + calculateRatisPipelineLimit, + createDefaultDiskGroups, +} from './calculateRatisPipelineLimit'; +import styles from './styles.module.css'; + +let nextGroupId = 3; + +function NumberField({ id, label, hint, value, onChange, min = 0 }) { + return ( +
+ + onChange(event.target.value)} + /> + {hint ? {hint} : null} +
+ ); +} + +export default function RatisPipelineCalculator() { + const [useDynamicLimit, setUseDynamicLimit] = useState(true); + const [heterogeneousDisks, setHeterogeneousDisks] = useState(true); + const [globalLimit, setGlobalLimit] = useState(30); + const [datanodePipelineLimit, setDatanodePipelineLimit] = useState(2); + const [pipelinesPerMetadataDisk, setPipelinesPerMetadataDisk] = useState(2); + const [healthyDatanodes, setHealthyDatanodes] = useState(10); + const [uniformMetadataDisks, setUniformMetadataDisks] = useState(4); + const [diskGroups, setDiskGroups] = useState(createDefaultDiskGroups); + + const effectiveDiskGroups = useMemo(() => { + if (!useDynamicLimit) { + return []; + } + if (heterogeneousDisks) { + return diskGroups; + } + return [{ datanodeCount: healthyDatanodes, metadataDisks: uniformMetadataDisks }]; + }, [ + diskGroups, + healthyDatanodes, + heterogeneousDisks, + uniformMetadataDisks, + useDynamicLimit, + ]); + + const result = useMemo( + () => + calculateRatisPipelineLimit({ + useDynamicLimit, + datanodePipelineLimit, + pipelinesPerMetadataDisk, + globalLimit, + diskGroups: effectiveDiskGroups, + healthyDatanodes, + }), + [ + datanodePipelineLimit, + effectiveDiskGroups, + globalLimit, + healthyDatanodes, + pipelinesPerMetadataDisk, + useDynamicLimit, + ], + ); + + const updateDiskGroup = (id, field, value) => { + setDiskGroups((groups) => + groups.map((group) => + group.id === id ? { ...group, [field]: value } : group, + ), + ); + }; + + const addDiskGroup = () => { + setDiskGroups((groups) => [ + ...groups, + { + id: `group-${nextGroupId++}`, + datanodeCount: 1, + metadataDisks: 2, + }, + ]); + }; + + const removeDiskGroup = (id) => { + setDiskGroups((groups) => + groups.length > 1 ? groups.filter((group) => group.id !== id) : groups, + ); + }; + + return ( +
+ + Pipeline Limit Calculator + +

+ Estimate the number of open FACTOR_THREE Ratis pipelines SCM targets from your + configuration values. +

+ +
+
+ + +
+ + + + {useDynamicLimit ? ( + + ) : ( + + )} +
+ + {useDynamicLimit ? ( +
+
+ + +
+ + {heterogeneousDisks ? ( + <> + + Datanode groups + + + + + + + + + + {diskGroups.map((group) => ( + + + + + + ))} + +
DatanodesMetadata disks per datanode +
+ + updateDiskGroup(group.id, 'datanodeCount', event.target.value) + } + /> + + + updateDiskGroup(group.id, 'metadataDisks', event.target.value) + } + /> + + +
+
+ +
+ + ) : ( +
+ + +
+ )} +
+ ) : ( +
+ +
+ )} + +
+

+ Estimated open Ratis pipelines: {result.finalTarget} +

+
    + {result.steps.map((step) => ( +
  1. + {step.label}:{' '} + + {step.formula} = {step.value} + +
  2. + ))} +
+

+ SCM maintains approximately this many open, FACTOR_THREE Ratis pipelines. Actual + counts may differ slightly during cluster changes. + {result.globalLimitApplied + ? ' The global limit reduced the dynamic/fixed target.' + : null} +

+
+
+ ); +} diff --git a/src/components/RatisPipelineCalculator/styles.module.css b/src/components/RatisPipelineCalculator/styles.module.css new file mode 100644 index 0000000000..494a052930 --- /dev/null +++ b/src/components/RatisPipelineCalculator/styles.module.css @@ -0,0 +1,177 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +.wrapper { + margin: 1.5rem 0 2rem; + padding: 1.5rem; + border: 1px solid var(--ifm-color-emphasis-300); + border-radius: 8px; + background: var(--ifm-background-surface-color); +} + +.title { + margin: 0 0 0.5rem; + font-size: 1.25rem; +} + +.description { + margin: 0 0 1.25rem; + color: var(--ifm-color-emphasis-700); +} + +.grid { + display: grid; + grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); + gap: 1rem; + margin-bottom: 1rem; +} + +.field { + display: flex; + flex-direction: column; + gap: 0.35rem; +} + +.label { + font-weight: 600; + font-size: 0.9rem; +} + +.hint { + font-size: 0.8rem; + color: var(--ifm-color-emphasis-600); +} + +.input, +.select { + padding: 0.55rem 0.65rem; + border: 1px solid var(--ifm-color-emphasis-300); + border-radius: 4px; + font-size: 0.95rem; + background: var(--ifm-background-color); + color: var(--ifm-font-color-base); +} + +.input:focus, +.select:focus { + outline: none; + border-color: var(--ifm-color-primary); + box-shadow: 0 0 0 2px rgba(53, 117, 0, 0.12); +} + +.section { + margin-top: 1rem; + padding-top: 1rem; + border-top: 1px solid var(--ifm-color-emphasis-200); +} + +.sectionTitle { + margin: 0 0 0.75rem; + font-size: 1rem; +} + +.groupTable { + width: 100%; + border-collapse: collapse; + margin-bottom: 0.75rem; +} + +.groupTable th, +.groupTable td { + padding: 0.5rem; + border-bottom: 1px solid var(--ifm-color-emphasis-200); + text-align: left; +} + +.groupTable th { + font-size: 0.85rem; + color: var(--ifm-color-emphasis-700); +} + +.buttonRow { + display: flex; + gap: 0.75rem; + flex-wrap: wrap; +} + +.secondaryButton, +.removeButton { + padding: 0.45rem 0.85rem; + border-radius: 4px; + font-size: 0.9rem; + cursor: pointer; +} + +.secondaryButton { + border: 1px solid var(--ifm-color-primary); + background: transparent; + color: var(--ifm-color-primary); +} + +.secondaryButton:hover { + background: var(--ifm-color-primary); + color: #fff; +} + +.removeButton { + border: 1px solid var(--ifm-color-emphasis-400); + background: transparent; + color: var(--ifm-color-emphasis-800); +} + +.removeButton:hover { + border-color: var(--ifm-color-danger); + color: var(--ifm-color-danger); +} + +.result { + margin-top: 1.25rem; + padding: 1rem 1.1rem; + border-radius: 6px; + background: var(--ifm-color-emphasis-100); +} + +.resultValue { + margin: 0 0 0.75rem; + font-size: 1.1rem; +} + +.resultValue strong { + color: var(--ifm-color-primary-darker); +} + +.stepList { + margin: 0; + padding-left: 1.25rem; +} + +.stepItem { + margin-bottom: 0.35rem; +} + +.stepFormula { + font-family: var(--ifm-font-family-monospace); + font-size: 0.85rem; +} + +.note { + margin: 0.75rem 0 0; + font-size: 0.85rem; + color: var(--ifm-color-emphasis-700); +} diff --git a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/01-placeholder.md b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/01-placeholder.md index 6f2c1e3d87..db823a9403 100644 --- a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/01-placeholder.md +++ b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/01-placeholder.md @@ -67,5 +67,5 @@ A typical production Ozone cluster includes the following services: ### Ozone Configuration - **Monitoring**: Install [Prometheus](../../operations/observability/prometheus) and [Grafana](../../operations/observability/grafana) for monitoring the Ozone cluster. For audit logs, consider using a log ingestion framework such as the ELK Stack (Elasticsearch, Logstash, and Kibana) with FileBeat, or other similar frameworks. Alternatively, you can use Apache Ranger to manage audit logs. -- **Pipeline Limits**: Increase the number of allowed write pipelines to better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit`. See the [Multi-Raft](./multi-raft) documentation for the formula to calculate appropriate pipeline limits based on your metadata disk configuration. +- **Pipeline Limits**: Increase the number of allowed write pipelines to better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit`. See [Calculating Ratis Pipeline Limits](./calculating-ratis-pipeline-limits) for the sizing formula, trade-offs, and production guidance. See [Multi-Raft](./multi-raft) for Datanode metadata directory configuration. - **Heap Sizes**: Configure sufficient heap sizes for Ozone Manager (OM), Storage Container Manager (SCM), Recon, Datanode, S3 Gateway (S3G), and HttpFS services to ensure stability. diff --git a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md index 216a0201ed..c6420d47d8 100644 --- a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md +++ b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/07-calculating-ratis-pipeline-limits.md @@ -2,6 +2,8 @@ sidebar_label: Calculating Ratis Pipeline Limits --- +import RatisPipelineCalculator from '@site/src/components/RatisPipelineCalculator'; + # Calculating Ratis Pipeline Limits ReplicationFactor.THREE is controlled by three configuration properties that limit the @@ -51,10 +53,59 @@ And the configuration is: SCM will attempt to create and maintain approximately **24** open, FACTOR_THREE Ratis pipelines. -**Production Recommendation:** + + +## Sizing Trade-offs + +Pipeline limits balance write throughput against Datanode resource usage. Each open Ratis pipeline is one Ratis +replication group, so the number of pipelines directly affects how much concurrent write capacity the cluster +exposes. + +**Benefits of more pipelines** + +- **Higher write parallelism**: Each pipeline is an independent Ratis group, so more pipelines allow more + concurrent write streams across the cluster. +- **Lower write latency under contention**: With more pipelines, different clients are less likely to share the + same Ratis group, reducing queueing on a single Raft leader. + +**Costs of too many pipelines** + +- **Datanode resource pressure**: Each pipeline consumes CPU, memory, and metadata-disk I/O on the Datanodes + that participate in it. Very high pipeline counts can overload individual nodes. +- **Follower lag and tail latency**: When a Datanode hosts many Raft groups, followers can fall behind on log + replication (index lag). This delays WATCH responses and increases write tail latency for pipelines on that + node. + +### Ozone vs HDFS write pipelines + +Ozone and HDFS take different approaches to grouping Datanodes for replicated writes: + +| | Ozone (Ratis pipelines) | HDFS (stateless pipelines) | +| --- | --- | --- | +| **Pipeline model** | SCM creates and tracks a bounded set of explicit Ratis pipelines | Any three Datanodes can form an ad hoc "pipeline" for three block replicas | +| **Control & visibility** | Centralized limits and monitoring via SCM (`ozone admin pipeline list`, SCM UI) | No equivalent cluster-wide pipeline registry | +| **Load spreading** | Bounded by configured pipeline limits; may cap peak write throughput | Blocks can spread across any DNs, spreading load more evenly | +| **Operational trade-off** | More predictable capacity planning; may sacrifice some peak throughput | Higher potential throughput; harder to observe and cap concurrent write groups | + +![Ozone Ratis pipelines vs HDFS stateless block placement](ratis-vs-hdfs-pipelines.png) + +## Production Recommendation + +For most production deployments, start with the dynamic per-disk limit (`ozone.scm.datanode.pipeline.limit=0`). +This lets pipeline capacity scale naturally with metadata disk resources. A good starting value for +`ozone.scm.pipeline.per.metadata.disk` is **2**. + +Increase pipeline limits only when you observe write contention—for example, many clients sharing the same +pipelines or sustained write queueing—and Datanode resources (CPU, memory, metadata-disk I/O) have headroom. +Use `ozone.scm.ratis.pipeline.limit` as a safety cap rather than the primary tuning knob. + +Monitor the **Pipeline Statistics** section in the SCM web UI, or run `ozone admin pipeline list`, to confirm +the actual number of open pipelines aligns with your configured targets. Watch for follower index lag or +elevated resource usage on Datanodes as signals that pipeline counts may be too high. + +## Related Topics -For most production deployments, using the dynamic per-disk limit (`ozone.scm.datanode.pipeline.limit=0`) is -recommended, as it allows the cluster to scale pipeline capacity naturally with its resources. You can use the -global limit (`ozone.scm.ratis.pipeline.limit`) as a safety cap if needed. A good starting value for -`ozone.scm.pipeline.per.metadata.disk` is **2**. Monitor the section **Pipeline Statistics** in SCM web UI, or run -the command `ozone admin pipeline list` to see if the actual number of pipelines aligns with your configured targets. +Limits on this page apply to **open Ratis pipelines** (ReplicationFactor.THREE). The number of OPEN containers +and containers per pipeline are separate concerns and are not covered here. For background on Multi-Raft and +Datanode metadata directories, see [Multi-Raft](./multi-raft). For EC pipeline sizing, see +[Calculating EC Pipeline Limits](./calculating-ec-pipeline-limits). diff --git a/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/ratis-vs-hdfs-pipelines.png b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/ratis-vs-hdfs-pipelines.png new file mode 100644 index 0000000000..b37363f24e Binary files /dev/null and b/versioned_docs/version-2.1.0/05-administrator-guide/02-configuration/04-performance/ratis-vs-hdfs-pipelines.png differ