ceap: support $ref schemas in AssemblySpecification and resolve at query time#132
Merged
Conversation
…ery time
UNTP and similar standards publish their JSON schemas at stable URLs.
Embedding copies leads to version drift and large JSON fixtures.
Allow AssemblySpecification.jsonschema to be a bare {'$ref': 'url#/ptr'}
value. The ref is validated synchronously during model construction (to
catch bad URLs early) but the resolved schema is not stored — it is
fetched fresh each time a query is assembled so that patch updates to
the published schema are picked up automatically.
ExtractAssembleDataUseCase._assemble_iteration now resolves the schema
once per iteration (async, via httpx) before building the PointableJSONSchema
and passes the resolved dict to _validate_assembled_data.
Cover the new $ref code paths: - TestAssemblyRefSchemaValidation verifies that a bare $ref is accepted, stored unchanged, validated against the resolved schema content (including JSON Pointer checks against the resolved shape), and survives a serialisation roundtrip. - TestResolveJsonSchema verifies the async _resolve_jsonschema helper: inline schemas pass through unchanged; a bare $ref is fetched and returned; a fragment ref extracts the sub-schema and bundles parent $defs. A shared conftest.py fixture starts a minimal stdlib HTTP server (random free port, no test dependencies) used by both test classes.
_fetch_and_resolve_ref and _resolve_jsonschema both duplicated the same fragment-extraction and $defs-bundling logic. Extract it into extract_schema_from_fetched() in a shared ceap._schema_ref module so the two callers only differ in how they perform the HTTP fetch (sync vs async).
Direct httpx calls in ExtractAssembleDataUseCase violated Temporal's determinism requirement (non-deterministic I/O in workflow code breaks replay) and Clean Architecture (HTTP is infrastructure, not use case logic). The sandbox ban on httpx at import time was the signal, not a technicality to work around. Introduces RemoteSchemaRepository — a single fetch(url) protocol — run through the standard three-layer pattern: HttpRemoteSchemaRepository (Layer 1), TemporalHttpRemoteSchemaRepository (Layer 2 activity), WorkflowRemoteSchemaRepositoryProxy (Layer 3 proxy). The use case _resolve_jsonschema becomes an instance method calling the injected repo, keeping all I/O behind the activity boundary.
Domain model validators cannot make HTTP calls — they run at object construction and deserialization time, including inside Temporal workflow replay where networking is banned by the sandbox. Bare \$ref schemas are now accepted as-is in the jsonschema validator; existence-checking of JSON Pointer keys against the resolved schema is skipped when the schema is a bare \$ref. Both deferral cases have explicit comments explaining when resolution actually happens (RemoteSchemaRepository at assembly time).
Tests that expected ValidationError on unresolvable URLs or missing pointers in remote schemas were written against the old eager-fetch behaviour. Now that the domain model defers resolution to assembly time, those cases are accepted at construction; update tests accordingly and add an explicit test that malformed pointer format is still rejected.
temporal_activity_registration discovers methods to wrap by scanning for Protocol classes in the MRO. Without explicit inheritance, the Protocol never appears in the MRO and fetch is never registered as an activity, causing NotFoundError at runtime.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
UNTP and similar standards publish their JSON schemas at stable URLs, but the current code requires the full schema to be embedded inline (as it was using the 0.7.0 schema which isn't published). Embedding copies of published schemas would lead to version drift and bloated JSON fixtures. So switching back to 0.6.0 schema and using the published version, hence the change here in julee to support refs.
This change allows
AssemblySpecification.jsonschemato hold a bare{"$ref": "url#/ptr"}value. The ref is validated synchronously during model construction so bad URLs are caught early, but the resolved schema is not stored — it is re-fetched on each query iteration so that patch updates to the published schema are picked up automatically.ExtractAssembleDataUseCaseresolves the schema asynchronously via httpx just before building thePointableJSONSchema, keeping resolution out of the domain model itself.