feat: enforce per-field structural caps at file-parse time (#32)#38
Merged
Conversation
Sidetree caps several per-field sizes when parsing the index/proof/chunk files;
our reader enforced none of them. Add the caps the reference implementation
actually enforces at parse time:
- writerLockId <= maxWriterLockIdInBytes (200), in the core index file.
- embedded CAS URIs <= maxCasUriLength (100): coreProofFileUri,
provisionalIndexFileUri (core index) and provisionalProofFileUri,
chunkFileUri (provisional index).
- each operation delta's canonicalized (JCS) size <= maxDeltaSizeInBytes (1000),
measured on the RAW on-wire bytes of the delta (not a re-marshaled did.Delta,
which drops unknown fields and would undercount), matching the reference which
canonicalizes the parsed delta object including unknown fields.
Violations route through classifyMalformed -> ErrMalformed (permanent skip). New
sentinels ErrCASURITooLong, ErrWriterLockIDTooLong, ErrDeltaTooLarge. Resolves the
core-index, provisional-index, and chunk file-size TODOs (size itself is enforced
at fetch, #31; this adds the per-field caps).
Deliberately NOT enforced, to avoid rejecting anchors the reference accepts
(verified against decentralized-identity/sidetree v1.0.6 during an adversarial
review):
- The reveal value is NOT length-checked. The reference validates it as a
supported-algorithm multihash, not by length (maxEncodedRevealValueLength is
unused in the reference); the SHA-256 algorithm is enforced downstream by
ion-sdk-go's did.CheckReveal at operation-apply time. A length cap here would
permanently reject valid reveals (one-directional, fatal divergence).
- The anchor's own core-index CID is NOT length-checked. The reference applies
maxCasUriLength only to EMBEDDED URIs; a valid long CIDv1 must not be rejected.
The SHA-256-only multihash rule (hashAlgorithmsInMultihashCode=[18]) is therefore
satisfied downstream (did.CheckReveal; suffixes/commitments computed with SHA-256),
not re-validated at parse here.
Known follow-up (pre-existing, out of scope): the reference rejects unknown
top-level JSON properties on every file type (validateObjectContainsOnlyAllowed
Properties); our struct-based parsing silently drops them. Tracked separately.
Tested: TestProcessorPerFieldCaps (writerLockId/embedded-URI reject + at-cap
accept), TestProcessorProvisionalFieldCaps, TestProcessorDeltaSizeCap,
TestProcessorDeltaSizeCapCountsUnknownFields (the raw-bytes anti-bypass), and
TestProcessorAcceptsRealisticAnchor (a realistic batch with a 59-char CIDv1, long
reveal values, and a real delta processes cleanly — the over-rejection regression
guard). go.mod: gowebpki/jcs promoted to a direct dependency.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sidetree caps several per-field sizes when parsing the index/proof/chunk files; our reader enforced none of them. This adds the caps the reference implementation actually enforces at parse time, and — guided by an adversarial review against
decentralized-identity/sidetreev1.0.6 — deliberately omits two it does not, to avoid rejecting anchors the network accepts. Plan:docs/plans/2026-06-04-001-feat-ion-value-locking-protocol-rules-plan.md.Enforced
writerLockIdMaxWriterLockIDInBytes(200)coreProofFileUri,provisionalIndexFileUriMaxCASURILength(100)provisionalProofFileUri,chunkFileUriMaxCASURILength(100)MaxDeltaSizeInBytes(1000)The delta size is measured on the raw on-wire bytes of each delta, not a re-marshaled
did.Delta(which drops unknown fields and would undercount) — matching the reference, which canonicalizes the parsed object including unknown fields. Violations →classifyMalformed→ErrMalformed(permanent skip). New sentinelsErrCASURITooLong,ErrWriterLockIDTooLong,ErrDeltaTooLarge.Deliberately NOT enforced (would diverge from the reference — over-rejection)
An adversarial review caught two over-rejections in the first cut; both removed:
maxEncodedRevealValueLengthis unused there). The SHA-256 algorithm is enforced downstream by ion-sdk-go'sdid.CheckRevealat apply time. A parse-time length cap here would permanently reject valid reveals (one-directional, fatal divergence).maxCasUriLengthonly to embedded URIs; a valid long CIDv1 must not be rejected.So the SHA-256-only multihash rule (
hashAlgorithmsInMultihashCode=[18]) is satisfied downstream (did.CheckReveal; suffixes/commitments computed with SHA-256), not re-validated at parse here.Known follow-up (pre-existing, out of scope)
The reference rejects unknown top-level JSON properties on every file type (
validateObjectContainsOnlyAllowedProperties); our struct-based parsing silently drops them (a systemic accept-what-ION-rejects gap, not introduced here). Worth a dedicated strict-schema pass.Testing
TestProcessorPerFieldCaps— writerLockId / embedded-URI reject and at-cap (200) accept.TestProcessorProvisionalFieldCaps— provisional proof / chunk URI caps.TestProcessorDeltaSizeCap+TestProcessorDeltaSizeCapCountsUnknownFields— the raw-bytes anti-bypass (a delta small once parsed but large on the wire is still rejected).TestProcessorAcceptsRealisticAnchor— over-rejection regression guard: a realistic batch (59-char CIDv1 anchor CID, reveal values longer than the removed 50-byte cap, a real ~900-byte delta) processes cleanly.gofmtclean;go build,go vet,go test -race -count=1 ./...green.go.mod:gowebpki/jcspromoted to a direct dependency.Post-Deploy Monitoring & Validation
ErrMalformedskips carryingErrCASURITooLong/ErrWriterLockIDTooLong/ErrDeltaTooLarge.🤖 Generated with Claude Code