From ec6da0b1ed53152ade92a7bf55f96a04836d5dd9 Mon Sep 17 00:00:00 2001
From: Wei-Chiu Chuang <weichiu@apache.org>
Date: Thu, 16 Apr 2026 15:51:55 -0700
Subject: [PATCH 1/2] HDDS-15047. [Docs] System internals: chunk and stripe
 checksums

Document chunk-level and EC stripe checksums, getFileChecksum hierarchy,
S3 ETags, combine modes, and key client/OM/S3G components. Link checksum
client properties to the configuration appendix; publish the data integrity page.

Made-with: Cursor
---
 .../03-data-integrity/01-checksums.md         | 99 ++++++++++++++++++-
 1 file changed, 95 insertions(+), 4 deletions(-)

diff --git a/docs/07-system-internals/03-data-integrity/01-checksums.md b/docs/07-system-internals/03-data-integrity/01-checksums.md
index da00e08f71..8b5ad8b0ee 100644
--- a/docs/07-system-internals/03-data-integrity/01-checksums.md
+++ b/docs/07-system-internals/03-data-integrity/01-checksums.md
@@ -1,10 +1,101 @@
 ---
-draft: true
 sidebar_label: Checksums
 ---
 
-# Checksum System Internals
+# Read and write checksums for chunks and stripes
 
-**TODO:** File a subtask under [HDDS-9862](https://issues.apache.org/jira/browse/HDDS-9862) and complete this page or section.
+Apache Ozone protects data integrity with checksums across the data lifecycle. Checksums are computed at **chunk** granularity for all data layouts and, for Erasure Coded (EC) data, also at **stripe** granularity. Together they let Ozone detect corruption and drive recovery at several layers.
 
-Document the internal implementations of read and write checksums for chunks and stripes.
+## Chunk-level checksumming
+
+Chunks are the smallest unit of data transfer and storage in Ozone (commonly **4&nbsp;MB**). Each chunk carries one or more checksums.
+
+### Write path (calculation)
+
+Checksums are computed mainly on the client inside **`BlockOutputStream`**.
+
+- **Mechanism:** While buffering data, the client uses **`org.apache.hadoop.ozone.common.Checksum`** to compute checksums.
+- **Granularity:** Controlled by [`ozone.client.bytes.per.checksum`](../../administrator-guide/configuration/appendix). Data is split into segments; each segment gets one checksum entry.
+- **Algorithms:** CRC32, CRC32C, SHA256, and MD5.
+- **Storage:** Checksums live in the **`ChecksumData`** field of the **`ChunkInfo`** protobuf and are sent to the Datanode on the **WriteChunk** request.
+
+### Read path (verification)
+
+The client verifies checksums on read for end-to-end integrity.
+
+- **Mechanism:** **`ChunkInputStream`** recomputes checksums for received segments and compares them to the values in **`ChunkInfo`** metadata.
+- **Failure handling:** On mismatch, an **`OzoneChecksumException`** is thrown. For replicated data the client retries another replica; for EC it triggers reconstruction.
+
+---
+
+## Stripe-level checksumming (erasure coding)
+
+Erasure coding adds **stripe** checksums on top of per-chunk checksums to protect each stripe ( **d** data blocks + **p** parity blocks).
+
+### Write path (calculation)
+
+In **`ECKeyOutputStream`**, after a stripe is encoded:
+
+- **Calculation:** **`ECBlockOutputStreamEntry.calculateChecksum()`** concatenates the chunk-level checksums of every chunk in the stripe (data and parity).
+- **Storage:** The resulting stripe checksum is stored in **`stripeChecksum`** on **`ChunkInfo`** and sent to Datanodes on **PutBlock**.
+
+### Usage
+
+- **File checksum API:** Derives block-group-level checksums without re-reading bytes.
+- **Datanode reconstruction:** Datanodes use stripe checksums to validate stripe integrity during reconciliation or offline recovery.
+
+---
+
+## Hadoop file checksum (`getFileChecksum()`)
+
+Ozone implements the Hadoop **`getFileChecksum()`** API so clients can compare file integrity across Hadoop-compatible filesystems.
+
+- **Hierarchy:**
+  1. **File checksum** — combines checksums of all blocks in the file.
+  2. **Block checksum** — combines checksums of all chunks in the block.
+  3. **Chunk checksum** — base CRC/hash for each **`bytes.per.checksum`** segment.
+- **Implementation:** The client (via **`FileChecksumHelper`**) loads **`ChunkInfo`** for all blocks from Datanodes or OM metadata and combines checksums locally, so **data bytes are not read** and the call stays fast.
+- **Combination logic:** [`ozone.client.checksum.combine.mode`](../../administrator-guide/configuration/appendix) selects how levels are merged (for example **`COMPOSITE_CRC`** or **`MD5MD5CRC`**).
+
+---
+
+## S3 ETags
+
+For S3 compatibility, Ozone stores an **ETag** (entity tag) per object.
+
+- **Simple uploads:** For **PutObject**, the S3 Gateway computes **MD5** over the object stream as it is written to Ozone and stores it in key metadata as the ETag.
+- **Multipart uploads:**
+  1. Each part gets an ETag (MD5 of that part).
+  2. On completion, **OM** builds a final ETag from the part MD5s (concatenate part digests, then MD5 that byte string), matching common S3 multipart behavior.
+  3. The final ETag is often suffixed with the part count (for example **`…-N`**).
+- **Storage:** ETags live in key metadata in **OM** and are returned on HEAD/GET through the S3 Gateway.
+
+---
+
+## Configuration and compatibility
+
+| Property | Default | Description |
+| -------- | ------- | ----------- |
+| [`ozone.client.checksum.type`](../../administrator-guide/configuration/appendix) | `CRC32` | HDFS often defaults to **CRC32C**. Use **CRC32C** in Ozone when you need comparable checksums for HDFS→Ozone migrations (for example DistCp integrity checks). |
+| [`ozone.client.bytes.per.checksum`](../../administrator-guide/configuration/appendix) | `16KB` | Segment size for chunk checksums. Minimum allowed is **8KB**. |
+| [`ozone.client.checksum.combine.mode`](../../administrator-guide/configuration/appendix) | `COMPOSITE_CRC` | How chunk/block checksums are merged for the file checksum API. |
+| [`ozone.client.verify.checksum`](../../administrator-guide/configuration/appendix) | `true` | Enables or disables client-side verification on read. |
+
+### Checksum combine modes
+
+- **`COMPOSITE_CRC` (default):** Builds a file-level CRC by mathematically combining lower-level chunk/block CRCs. Block-independent and efficient.
+- **`MD5MD5CRC`:** MD5 of the MD5 values of chunk checksums; useful for **legacy HDFS** compatibility.
+
+---
+
+## Key components
+
+| Component | Responsibility |
+| --------- | ---------------- |
+| `Checksum.java` | Core hashing (CRC32C, MD5, and so on). |
+| `ChunkInfo` (protobuf) | Carries **`checksumData`** (per chunk) and **`stripeChecksum`** (per stripe). |
+| `BlockOutputStream` | Client-side chunk checksums on write. |
+| `ChunkInputStream` | Client-side chunk checksum verification on read. |
+| `FileChecksumHelper` | Implements **`getFileChecksum()`** by aggregating metadata. |
+| S3 Gateway **`ObjectEndpoint`** | MD5-based ETags for **PutObject**. |
+| **OM** (S3 multipart **CompleteMultipartUpload** handling) | Final multipart ETag from part ETags. |

From 96e02f868162a051fd7b5faa34c1dbea5881ea53 Mon Sep 17 00:00:00 2001
From: Wei-Chiu Chuang <weichiu@apache.org>
Date: Thu, 16 Apr 2026 16:08:34 -0700
Subject: [PATCH 2/2] spelling

---
 cspell.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/cspell.yaml b/cspell.yaml
index 62d88fdb4a..416da0cd58 100644
--- a/cspell.yaml
+++ b/cspell.yaml
@@ -73,6 +73,7 @@ words:
 - ASF
 - PMC
 # Ozone specific words
+- checksumming
 - HDDS
 - ratis
 - OM