Skip to content

ceap: support $ref schemas in AssemblySpecification and resolve at query time#132

Merged
absoludity merged 11 commits into
masterfrom
ceap/assembly-spec-jsonschema-ref
Apr 9, 2026
Merged

ceap: support $ref schemas in AssemblySpecification and resolve at query time#132
absoludity merged 11 commits into
masterfrom
ceap/assembly-spec-jsonschema-ref

Conversation

@absoludity

@absoludity absoludity commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

UNTP and similar standards publish their JSON schemas at stable URLs, but the current code requires the full schema to be embedded inline (as it was using the 0.7.0 schema which isn't published). Embedding copies of published schemas would lead to version drift and bloated JSON fixtures. So switching back to 0.6.0 schema and using the published version, hence the change here in julee to support refs.

This change allows AssemblySpecification.jsonschema to hold a bare {"$ref": "url#/ptr"} value. The ref is validated synchronously during model construction so bad URLs are caught early, but the resolved schema is not stored — it is re-fetched on each query iteration so that patch updates to the published schema are picked up automatically. ExtractAssembleDataUseCase resolves the schema asynchronously via httpx just before building the PointableJSONSchema, keeping resolution out of the domain model itself.

…ery time

UNTP and similar standards publish their JSON schemas at stable URLs.
Embedding copies leads to version drift and large JSON fixtures.

Allow AssemblySpecification.jsonschema to be a bare {'$ref': 'url#/ptr'}
value. The ref is validated synchronously during model construction (to
catch bad URLs early) but the resolved schema is not stored — it is
fetched fresh each time a query is assembled so that patch updates to
the published schema are picked up automatically.

ExtractAssembleDataUseCase._assemble_iteration now resolves the schema
once per iteration (async, via httpx) before building the PointableJSONSchema
and passes the resolved dict to _validate_assembled_data.
Cover the new $ref code paths:
- TestAssemblyRefSchemaValidation verifies that a bare $ref is accepted,
  stored unchanged, validated against the resolved schema content (including
  JSON Pointer checks against the resolved shape), and survives a
  serialisation roundtrip.
- TestResolveJsonSchema verifies the async _resolve_jsonschema helper:
  inline schemas pass through unchanged; a bare $ref is fetched and returned;
  a fragment ref extracts the sub-schema and bundles parent $defs.

A shared conftest.py fixture starts a minimal stdlib HTTP server (random
free port, no test dependencies) used by both test classes.
_fetch_and_resolve_ref and _resolve_jsonschema both duplicated the same
fragment-extraction and $defs-bundling logic. Extract it into
extract_schema_from_fetched() in a shared ceap._schema_ref module so the
two callers only differ in how they perform the HTTP fetch (sync vs async).
@absoludity absoludity requested a review from monkeypants April 9, 2026 00:47
@absoludity absoludity marked this pull request as draft April 9, 2026 01:40
Direct httpx calls in ExtractAssembleDataUseCase violated Temporal's
determinism requirement (non-deterministic I/O in workflow code breaks
replay) and Clean Architecture (HTTP is infrastructure, not use case
logic). The sandbox ban on httpx at import time was the signal, not a
technicality to work around.

Introduces RemoteSchemaRepository — a single fetch(url) protocol — run
through the standard three-layer pattern: HttpRemoteSchemaRepository
(Layer 1), TemporalHttpRemoteSchemaRepository (Layer 2 activity),
WorkflowRemoteSchemaRepositoryProxy (Layer 3 proxy). The use case
_resolve_jsonschema becomes an instance method calling the injected
repo, keeping all I/O behind the activity boundary.
Domain model validators cannot make HTTP calls — they run at object
construction and deserialization time, including inside Temporal workflow
replay where networking is banned by the sandbox.

Bare \$ref schemas are now accepted as-is in the jsonschema validator;
existence-checking of JSON Pointer keys against the resolved schema is
skipped when the schema is a bare \$ref. Both deferral cases have
explicit comments explaining when resolution actually happens
(RemoteSchemaRepository at assembly time).
Tests that expected ValidationError on unresolvable URLs or missing
pointers in remote schemas were written against the old eager-fetch
behaviour. Now that the domain model defers resolution to assembly time,
those cases are accepted at construction; update tests accordingly and
add an explicit test that malformed pointer format is still rejected.
temporal_activity_registration discovers methods to wrap by scanning for
Protocol classes in the MRO. Without explicit inheritance, the Protocol
never appears in the MRO and fetch is never registered as an activity,
causing NotFoundError at runtime.
@absoludity absoludity marked this pull request as ready for review April 9, 2026 04:03
@absoludity absoludity merged commit f789cbf into master Apr 9, 2026
7 checks passed
@absoludity absoludity deleted the ceap/assembly-spec-jsonschema-ref branch April 9, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant