Skip to content

HDDS-15526. Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#10483

Open
potiuk wants to merge 4 commits into
apache:masterfrom
potiuk:asf-security/threat-model-2026-06-10
Open

HDDS-15526. Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#10483
potiuk wants to merge 4 commits into
apache:masterfrom
potiuk:asf-security/threat-model-2026-06-10

Conversation

@potiuk

@potiuk potiuk commented Jun 10, 2026

Copy link
Copy Markdown
Member

https://issues.apache.org/jira/browse/HDDS-15526

What

Adds a threat model for Apache Ozone, drafted at the Ozone PMC's request (the GLASSWING / Mythos scan pre-flight needs a discoverable threat model), plus the discoverability chain:

  • THREAT_MODEL.md — the model, following Michael Scovetta's rubric (public mirror).
  • SECURITY.md — your existing policy, preserved, with a Threat Model pointer appended.
  • AGENTS.md — routes a vulnerability-research agent through AGENTS.md -> SECURITY.md -> THREAT_MODEL.md.

The model in brief

Ozone is modelled as a cluster of network services (S3 Gateway, OM, SCM/internal-CA, Datanodes/Ratis, Recon) with distinct actors: untrusted client, authenticated-but-unauthorized user, operator, service peer, and a bounded-Byzantine datanode. The load-bearing knob is secure mode (ozone.security.enabled): findings that only manifest in non-secure (dev) mode are out of model. The model makes explicit that the KDC, Ranger policy correctness, the SCM CA private key, KMS keys, and network isolation are operator responsibilities — so scanner/AI reports against those route to "operator-owned" rather than churning.

DRAFT — you own and merge it

Most claims are tagged (documented) from the source/SECURITY.md; the architectural assumptions I marked (inferred) are gathered as open questions in section 14. The two that most shape the model:

  • Q-secure — confirm secure mode is the supported production posture (and whether the S3 Gateway ever supports intended anonymous access).
  • Q-ratis — the Ratis honest-majority safety bound you stand behind, and whether there's an independent block/container integrity check so a single Byzantine datanode can't serve corrupted data undetected.

Please edit freely. Once merged + discoverable, pre-flight passes and we queue the scan (no deadline pressure — the window is being extended as the ASF moves to Mythos 5).

Generated by the ASF Security team's threat-model tooling (Claude Opus); reviewed before opening.

@szetszwo szetszwo requested a review from kerneltime June 10, 2026 22:09
@potiuk potiuk changed the title Add security threat model (THREAT_MODEL.md + SECURITY.md pointer + AGENTS.md) HDDS-15526. Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md) Jun 11, 2026
….md/SECURITY.md

Reconciled against master's now-merged AGENTS.md (HDDS-15316) and SECURITY.md:
adds THREAT_MODEL.md and appends a ## Security discoverability section to the
existing AGENTS.md (AGENTS.md -> SECURITY.md -> THREAT_MODEL.md chain) plus a
## Threat Model pointer to SECURITY.md, rather than replacing either file.

Generated-by: Claude Opus 4.8 (1M context)
@potiuk potiuk force-pushed the asf-security/threat-model-2026-06-10 branch from db9a5af to 109c0a9 Compare June 11, 2026 15:00
@smengcl smengcl requested a review from adoroszlai June 12, 2026 18:15
@smengcl

smengcl commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Thanks @potiuk for raising this. I will try to review and revise this draft by the end of the week.

@smengcl

smengcl commented Jun 23, 2026

Copy link
Copy Markdown
Contributor
  • Q-secure — confirm secure mode is the supported production posture

Yes.

(and whether the S3 Gateway ever supports intended anonymous access).

When security is enabled, no, anonymous access will be rejected. (At least there is no plan for that right now, see https://issues.apache.org/jira/browse/HDDS-7961)

  • Q-ratis — the Ratis honest-majority safety bound you stand behind, and whether there's an independent block/container integrity check so a single Byzantine datanode can't serve corrupted data undetected.

Ratis gives standard Raft safety under an honest majority, e.g. 2 of 3 replicas for RATIS THREE. It is not Byzantine fault tolerant.

Ozone has checksum verification for normal reads and replica/container checks, so ordinary single-replica corruption is detected. But I would not claim a full guarantee against a Byzantine datanode that can forge both data and metadata on the path it serves

… (S3 GW rejects anon); Ratis honest-majority safety, not BFT

Generated-by: Claude Opus 4.8 (1M context)
@potiuk

potiuk commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

Thanks @smengcl — both answers folded into THREAT_MODEL.md (just pushed):

  • Q-secure → confirmed. Secure mode is now stated as the supported production posture (§3/§5a/§10), so non-secure-mode findings are OUT-OF-MODEL: non-default-build. For the S3 Gateway specifically I recorded that with security enabled, anonymous access is rejected (no plan otherwise, citing HDDS-7961) — so an "unauthenticated S3 request accepted in secure mode" finding is VALID, not a disclaimed mode.

  • Q-ratis → confirmed. §7/§8/§9 now state Ratis gives standard Raft safety under an honest majority (2 of 3 for RATIS THREE) and is not BFT. I also captured the integrity nuance you gave: checksum verification on normal reads + replica/container checks detect ordinary single-replica corruption, but there's no full guarantee against a Byzantine datanode that forges both data and metadata on the path it serves — that case is explicitly out of model.

Both (inferred) tags on those points are now (maintainer). No rush on the remaining wave-2/3 questions (authz default, token lifetimes, TDE/CSI/Recon scope) — whenever you get to them.

Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md Outdated
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md
Comment thread THREAT_MODEL.md
Folds Wei-Chiu Chuang's 2026-06-25 review into THREAT_MODEL.md:
- §5a: ACL (ozone.acl.enabled=false), block tokens
  (hdds.block.token.enabled=false), and TDE/KMS are off by default even
  in secure mode (answers Q-authz/Q-token/Q-tde).
- §3: CSI driver out of scope (not production-ready); Recon in scope;
  S3 anonymous-rejection made explicit + future web-hosting caveat.
- §7: cross-reference ozone-site#397 checksum doc.
- §10: protect OM/SCM/Recon RocksDB at rest; isolate KMS; track a
  production secure-deployment checklist.

Generated-by: Claude Opus 4.8 (1M context)
@potiuk

potiuk commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

@jojochuang — thanks for the thorough review. All nine points are folded into THREAT_MODEL.md (commit 7ee8791) and I've resolved the threads. Summary:

§5a — default-state baseline (the important one):

  • ACL checks off by default (ozone.acl.enabled=false); Native ACL is the default once enabled.
  • Block tokens off by default (hdds.block.token.enabled=false).
  • TDE (hdds.grpc.tls.enabled) and KMS optional, disabled by default.

These now answer Q-authz / Q-token / Q-tde and reset the "default build" baseline — a finding that assumes ACLs/tokens/TDE are on in a stock install is OUT-OF-MODEL: non-default-build.

§3 — scope:

  • CSI driver out of scope (not production-ready); Recon in scope as part of the production cluster.
  • Secure-mode S3 anonymous rejection made explicit, with the future S3 web-hosting opt-in-anonymous caveat noted.

§7: cross-referenced the ozone-site#397 checksum doc.

§10 — operator hardening:

  • Protect OM/SCM/Recon RocksDB at rest (restrictive perms + ideally on-disk encryption).
  • Isolate the KMS in a separate, firewalled network segment.
  • Tracking a consolidated production secure-deployment checklist for the Ozone docs.

Shout if I've mis-stated anything — happy to iterate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants