Skip to content

feat: enforce per-field structural caps at file-parse time (#32)#38

Merged
LiranCohen merged 1 commit into
masterfrom
feat/per-field-caps
Jun 4, 2026
Merged

feat: enforce per-field structural caps at file-parse time (#32)#38
LiranCohen merged 1 commit into
masterfrom
feat/per-field-caps

Conversation

@LiranCohen

Copy link
Copy Markdown
Contributor

Summary

Sidetree caps several per-field sizes when parsing the index/proof/chunk files; our reader enforced none of them. This adds the caps the reference implementation actually enforces at parse time, and — guided by an adversarial review against decentralized-identity/sidetree v1.0.6 — deliberately omits two it does not, to avoid rejecting anchors the network accepts. Plan: docs/plans/2026-06-04-001-feat-ion-value-locking-protocol-rules-plan.md.

Enforced

Field Cap Where
writerLockId MaxWriterLockIDInBytes (200) core index
coreProofFileUri, provisionalIndexFileUri MaxCASURILength (100) core index
provisionalProofFileUri, chunkFileUri MaxCASURILength (100) provisional index
operation delta (JCS-canonical) MaxDeltaSizeInBytes (1000) chunk

The delta size is measured on the raw on-wire bytes of each delta, not a re-marshaled did.Delta (which drops unknown fields and would undercount) — matching the reference, which canonicalizes the parsed object including unknown fields. Violations → classifyMalformedErrMalformed (permanent skip). New sentinels ErrCASURITooLong, ErrWriterLockIDTooLong, ErrDeltaTooLarge.

Deliberately NOT enforced (would diverge from the reference — over-rejection)

An adversarial review caught two over-rejections in the first cut; both removed:

  • Reveal value length — the reference validates the reveal as a supported-algorithm multihash, not by length (maxEncodedRevealValueLength is unused there). The SHA-256 algorithm is enforced downstream by ion-sdk-go's did.CheckReveal at apply time. A parse-time length cap here would permanently reject valid reveals (one-directional, fatal divergence).
  • Anchor core-index CID length — the reference applies maxCasUriLength only to embedded URIs; a valid long CIDv1 must not be rejected.

So the SHA-256-only multihash rule (hashAlgorithmsInMultihashCode=[18]) is satisfied downstream (did.CheckReveal; suffixes/commitments computed with SHA-256), not re-validated at parse here.

Known follow-up (pre-existing, out of scope)

The reference rejects unknown top-level JSON properties on every file type (validateObjectContainsOnlyAllowedProperties); our struct-based parsing silently drops them (a systemic accept-what-ION-rejects gap, not introduced here). Worth a dedicated strict-schema pass.

Testing

  • TestProcessorPerFieldCaps — writerLockId / embedded-URI reject and at-cap (200) accept.
  • TestProcessorProvisionalFieldCaps — provisional proof / chunk URI caps.
  • TestProcessorDeltaSizeCap + TestProcessorDeltaSizeCapCountsUnknownFields — the raw-bytes anti-bypass (a delta small once parsed but large on the wire is still rejected).
  • TestProcessorAcceptsRealisticAnchorover-rejection regression guard: a realistic batch (59-char CIDv1 anchor CID, reveal values longer than the removed 50-byte cap, a real ~900-byte delta) processes cleanly.
  • gofmt clean; go build, go vet, go test -race -count=1 ./... green. go.mod: gowebpki/jcs promoted to a direct dependency.

Post-Deploy Monitoring & Validation

  • What to monitor: observer ErrMalformed skips carrying ErrCASURITooLong / ErrWriterLockIDTooLong / ErrDeltaTooLarge.
  • Expected healthy signal: effectively zero on ION mainnet — canonical fields are well under cap. The removed reveal/anchor-CID caps mean no over-rejection of long-but-valid reveals or CIDs.
  • Failure signal / rollback trigger: any of these on a known-good mainnet anchor → revert and investigate the boundary.
  • Validation window & owner: first full mainnet resync post-deploy; owner = indexer maintainer.

🤖 Generated with Claude Code

Sidetree caps several per-field sizes when parsing the index/proof/chunk files;
our reader enforced none of them. Add the caps the reference implementation
actually enforces at parse time:

  - writerLockId <= maxWriterLockIdInBytes (200), in the core index file.
  - embedded CAS URIs <= maxCasUriLength (100): coreProofFileUri,
    provisionalIndexFileUri (core index) and provisionalProofFileUri,
    chunkFileUri (provisional index).
  - each operation delta's canonicalized (JCS) size <= maxDeltaSizeInBytes (1000),
    measured on the RAW on-wire bytes of the delta (not a re-marshaled did.Delta,
    which drops unknown fields and would undercount), matching the reference which
    canonicalizes the parsed delta object including unknown fields.

Violations route through classifyMalformed -> ErrMalformed (permanent skip). New
sentinels ErrCASURITooLong, ErrWriterLockIDTooLong, ErrDeltaTooLarge. Resolves the
core-index, provisional-index, and chunk file-size TODOs (size itself is enforced
at fetch, #31; this adds the per-field caps).

Deliberately NOT enforced, to avoid rejecting anchors the reference accepts
(verified against decentralized-identity/sidetree v1.0.6 during an adversarial
review):

  - The reveal value is NOT length-checked. The reference validates it as a
    supported-algorithm multihash, not by length (maxEncodedRevealValueLength is
    unused in the reference); the SHA-256 algorithm is enforced downstream by
    ion-sdk-go's did.CheckReveal at operation-apply time. A length cap here would
    permanently reject valid reveals (one-directional, fatal divergence).
  - The anchor's own core-index CID is NOT length-checked. The reference applies
    maxCasUriLength only to EMBEDDED URIs; a valid long CIDv1 must not be rejected.

The SHA-256-only multihash rule (hashAlgorithmsInMultihashCode=[18]) is therefore
satisfied downstream (did.CheckReveal; suffixes/commitments computed with SHA-256),
not re-validated at parse here.

Known follow-up (pre-existing, out of scope): the reference rejects unknown
top-level JSON properties on every file type (validateObjectContainsOnlyAllowed
Properties); our struct-based parsing silently drops them. Tracked separately.

Tested: TestProcessorPerFieldCaps (writerLockId/embedded-URI reject + at-cap
accept), TestProcessorProvisionalFieldCaps, TestProcessorDeltaSizeCap,
TestProcessorDeltaSizeCapCountsUnknownFields (the raw-bytes anti-bypass), and
TestProcessorAcceptsRealisticAnchor (a realistic batch with a 59-char CIDv1, long
reveal values, and a real delta processes cleanly — the over-rejection regression
guard). go.mod: gowebpki/jcs promoted to a direct dependency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@LiranCohen LiranCohen merged commit 2e978de into master Jun 4, 2026
1 check passed
@LiranCohen LiranCohen deleted the feat/per-field-caps branch June 4, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant