Skip to content

refactor: deterministic literal property targets + indexed WHERE filters#837

Closed
HexaField wants to merge 23 commits into
devfrom
refactor/literal-channel-v-separation
Closed

refactor: deterministic literal property targets + indexed WHERE filters#837
HexaField wants to merge 23 commits into
devfrom
refactor/literal-channel-v-separation

Conversation

@HexaField

@HexaField HexaField commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Two coupled changes to how resolveLanguage='literal' property values flow through the executor.

Property writes become deterministic. Today, setting a literal-language property routes through LanguageController.expression_create, which signs the value into a literal:json:<envelope> URL containing {author, timestamp, data, proof} — the same author + timestamp the link reifier already carries. The signature varies per write, so writing name = "general" produces a different IRI every time, and exact-match SPARQL filters never hit. This PR drops the wrap: resolve_property_value emits a plain literal:string:X (or :number: / :boolean: / :json: for primitives) URL via literal_encode directly. The reifier remains the single source of truth for per-link provenance.

WHERE filters use the index. With deterministic IRIs in storage, the WHERE builders for the model query layer can emit direct IRI matches (?source <pred> <literal:string:X>) instead of wrapping every target in a custom fn/parse_literal SPARQL function call. Native POS-index probe replaces a per-row function call.

cc @lucksus @data-bot-coasys — lands on top of #803's history. Commit range a9e98ccd..a00b595b iterated this same territory; the 353a7ad7 revert was driven by integration tests that codified the envelope shape, which this PR restructures.

What changed

Writes (executor)

  • perspective_instance.rs::resolve_property_value — when resolve_language == "literal", encodes the value directly via literal_encode and returns the plain URL. Other resolveLanguage values (real language controllers — note, image, etc.) still go through expression_create; those produce addressable signed expressions by design.
  • languages/literal.rs::literal_decode — no longer wraps primitive return values in a synthetic {author: "<unknown>", timestamp: "<unknown>", data: …, proof: {}} envelope. Returns the raw value. Genuine signed expressions (JSON objects) pass through unchanged.
  • core/src/perspectives/SparqlBindings.ts::parseLit — drops the .data extraction branch. With property values now stored as plain literals, the Literal.fromUrl(val).get() result is already the value, and the only objects it returns are real JSON payloads (which we JSON-stringify for display).

Migration

  • Restored migrate_signed_envelopes_to_plain_literals (originally landed in a9e98ccd, removed by 353a7ad7). On startup it walks every reifier, finds targets shaped as literal:json:<envelope> whose JSON has data+author+proof, extracts the inner value, and rewrites both the direct triple and the reifier (the reifier IRI hash includes the target, so it must be recomputed). Idempotent — short-circuits at migration_version >= 3. Wired into initialize_from_db after the named-graphs migration.
  • Envelope-unwrap branches retained in parse_literal_fn and parse_literal_value so SPARQL queries continue to handle pre-migration data + the small set of expressions that are themselves stored as signed envelopes (e.g. entanglement proofs in the dapp).

Indexed WHERE filters

  • model_query/sparql_builder.rsis_literal_prop equality WHERE filters emit ?source <pred> <literal:string:X_encoded> directly. String / number / boolean / StringArray / NumberArray covered. Ops (gt/lt/between/contains/not) keeps fn/parse_literal since it needs typed comparison.
  • Fallback for constructor-default raw IRIs (e.g. state = 'todo://ready' on a resolveLanguage='literal' property — the #803 d3da07d7 case): { ?source <pred> <literal:string:encoded> } UNION { ?source <pred> <raw-iri> }. Both branches are POS-index probes; the planner picks one.
  • model_query/projection.rs::build_projection_where_patterns — same transformation for the projection WHERE-clause builder. pred_lookup widened to carry is_literal_prop.

Tests

  • tests/js/tests/prolog-and-literals.test.ts — three round-trip assertions flipped from expect(literal.data).to.equal(X) to expect(literal).to.equal(X) to match the new (un-wrapped) Literal.fromUrl().get() return.
  • model_query/integration_tests.rs — the helper that built envelope-shaped targets and the tests that asserted query behaviour against them are renamed legacy_envelope_* and now call migrate_signed_envelopes_to_plain_literals before querying (genuine migration tests, not "queries work on envelope storage" tests). Parallel plain_literal_* tests cover the post-migration form.

What did NOT change

  • expression.create(value, "literal") still produces signed envelopes wrapped as literal:json:<envelope>. ExpressionClient.get(url)'s envelope fast-path is preserved.
  • Agent profile expression API (create_signed_expression for link metadata) is untouched.
  • SDNA payload literals (Prolog source strings, SHACL action JSON) — already plain literals before this PR, still plain.
  • Multi-user expression-authoring tests — unchanged contract.

Performance

Microbenchmark (bench_indexed_iri_vs_fn_parse_literal_filter in model_query/integration_tests.rs, gated to release builds, scale via WT_BENCH_LINKS) running the two equivalent SPARQL forms against the same literal:string: data on this branch:

n links indexed direct-IRI fn/parse_literal FILTER speedup
1,000 11.4 µs 290.8 µs 25.5×
5,000 12.5 µs 1,476.4 µs 118.4×
10,000 13.5 µs 3,046.1 µs 226.2×
20,000 15.8 µs 6,215.0 µs 394.2×
50,000 39.0 µs 19,436.0 µs 498.8×

Indexed stays flat (POS-index probe — sub-linear in the result count, ~unchanged by the dataset size). FILTER scales linearly because every row materialises through the custom function. Apple Silicon, 5 runs each, warm cache; reproduce with cargo test --release --lib bench_indexed_iri_vs_fn_parse_literal_filter -- --nocapture.

Wind tunnel S8 comparison vs origin/dev was attempted but the cached CUSTOM_DENO_SNAPSHOT.bin produces a V8 magic-number mismatch against fresh executor builds (Check failed: magic_number_ == SerializedData::kMagicNumber), causing the executor to crash on language-runtime init under the wind tunnel's cargo build --release path regardless of branch. That's a tunnel-infrastructure issue unrelated to this PR; the microbench above captures the relevant per-query delta directly.

Test plan

  • cargo check --tests clean
  • cargo test --lib languages::literal — 4 pass
  • cargo test --lib perspectives::model_query::utils — 15 pass
  • cargo test --lib perspectives::sparql_store — 78 pass
  • cargo test --lib perspectives::model_query — 146 pass (incl. the renamed legacy_envelope_migrated_then_* + new plain_literal_* tests)
  • pnpm exec tsc --noEmit (core) — clean
  • pnpm exec jest src/Literal.test.ts — 5 pass
  • Full CircleCI suite green on this PR
  • Wind tunnel s1/s2/s5/s8 + WHERE-heavy scenarios — dev vs this branch
  • Integration test runner (tests/js) — full suite
  • Manual smoke: spin executor against existing dataset with envelope-encoded literals, verify migration log fires + post-migration query returns expected results

Migration safety

  • Idempotent: migration_version >= 3 short-circuits.
  • Per-link transactional: each envelope-link replacement is one insert + one delete batch. A crash mid-migration leaves a mix of pre- and post-migration data — both readable thanks to the back-compat envelope-unwrap branches in parse_literal_fn and parse_literal_value.
  • Only touches reifier targets where the JSON parses, has .data+.author+.proof, and the new encoded form differs from the old. Won't accidentally rewrite valid literal:json: payloads that aren't envelopes.

Follow-ups

  • Typed RDF literals on the wire ("value"^^xsd:string instead of <literal:string:value>) and removing fn/parse_literal registration entirely — tackled in #842, stacked on this PR.
  • Pushing last-write-wins / hydration aggregation into SPARQL (currently in Rust) — explored in #846. The straightforward nested-aggregate plan hits an Oxigraph 0.5.8 planner cliff (the warning in ac57680b9 was confirmed empirically at ~23,000× regression). The most plausible unblock is storage-level partitioning from named graphs (#812); re-attempt once that lands and bench whether per-graph scoping flattens the planner working set. Note: SPARQL itself does not have window functions — neither 1.1 nor the W3C 1.2 draft introduces them — so the wait is on Oxigraph planner / dataset semantics, not on a spec change.
  • Flux read sites that assumed the envelope shape via .get().data and write sites that produced envelopes via client.expression.create(value, "literal") — handled in coasys/flux#604.

Summary by CodeRabbit

  • New Features

    • Automatic, idempotent migration of legacy signed-envelope literals to plain-literal storage during initialization
  • Improvements

    • Deterministic encoding/decoding for string/number/boolean literals for reliable exact-match queries and filtering
    • Writes now store literal-typed properties in deterministic literal form for consistent indexing
    • SPARQL query generation improved to match literal properties directly with safer IRI handling
    • Literal parsing unwraps signed-expression envelopes for comparisons
  • Tests

    • Added/updated migration and literal-related tests and a release-only benchmark

HexaField and others added 8 commits June 4, 2026 00:01
…lue (V1)

Part of the Channel V / Channel E separation. See
~/.sovereign/membranes/coasys/research/literal-encoding-and-sparql-pushdown-2026-06-03.md
§9.1 for the audit.

When resolve_language == "literal", encode the value directly via
literal_encode and return a deterministic plain URI
(literal:string:X / :number: / :boolean: / :json:) instead of going
through LanguageController.expression_create. Provenance for property
values lives on the link reifier — the signed-envelope shape is for
Channel E (expression.create) callers, not for property storage.

For all other resolve_language values (real language controllers —
note, image, etc.) the existing expression_create flow is unchanged.
literal_decode used to wrap non-object primitives in a fake
{author: "<unknown>", timestamp: "<unknown>", data: value, proof: {}}
envelope. That conflated Channel V (link-property values; provenance
lives on the link reifier) with Channel E (signed expressions).

Now primitives decode to their raw JSON value and objects pass through
unchanged. Channel-E callers that legitimately want envelope shape go
through create_signed_expression, not literal_decode.

Unit tests updated to assert raw round-trip; the JSON-object test is
unchanged.
After the Channel V refactor, Literal.fromUrl(val).get() for property
values returns the plain primitive directly. The only objects it can
return now are legitimate JSON objects from literal:json: payloads,
which we JSON-stringify for display.

Drops the .data extraction branch that used to unwrap signed-envelope
literals — those no longer exist for new property writes.
After the Channel V refactor, properties with resolveLanguage: "literal"
store their values as plain literal:string:X / :number: / :boolean: /
:json: URIs — no signed envelope. Tests now assert the value rather
than the envelope shape.

Renames the long-value test's local `expression` to `literal` for
consistency with the other two cases.
…ack-compat read path retained)

Restores migrate_signed_envelopes_to_plain_literals (originally added
in a9e98cc, removed by 353a7ad). Walks every reifier, finds targets
of the form literal:json:<envelope> whose JSON has data+author+proof,
extracts .data, re-encodes as the appropriate plain literal type, and
rebuilds both the direct triple and the reifier (the reifier IRI hash
includes the target, so it must be recomputed). Removes the old direct
triple only if no other reifiers still reference it. Migration is
idempotent — short-circuits when migration_version() >= 3 — and runs
during perspective initialization right after the named-graphs
migration (v2 -> v3).

Also annotates the back-compat envelope-unwrap branches in
parse_literal_fn (sparql_store.rs) and parse_literal_value
(model_query/utils.rs) with a note that they exist for pre-migration
data; new writes use plain literal: forms.

See §9.1 / §9.6 of literal-encoding-and-sparql-pushdown-2026-06-03.md.
Equality WHERE filters on `resolveLanguage: literal` properties now emit
direct IRI matches against the deterministic encoding produced by
`resolve_property_value` (Phase 1):

  ?source <pred> <literal:string:VAL_ENCODED> .
  ?source <pred> <literal:number:N> .
  ?source <pred> <literal:boolean:true|false> .

Previously each row was checked via `STR(fn/parse_literal(?v)) = "val"`
inside a FILTER, which forced Oxigraph to scan every triple bound to the
predicate. Switching to a direct IRI lets the POS index probe straight to
the matching row.

The `fn/parse_literal` BIND + FILTER path is preserved for:
- `WhereCondition::Ops` (gt/lt/gte/lte/between/contains/not) — typed
  comparison still needs the unwrapped value.
- Properties without `resolveLanguage: literal` — back-compat for raw
  string storage.
- Legacy envelope-form data that hasn't yet been migrated by
  `migrate_signed_envelopes_to_plain_literals` (Phase 2) — `parse_literal`
  also unwraps `.data` from signed envelopes.

Where a where-value also looks like an absolute IRI (scheme + colon),
a UNION fallback matches the raw-IRI form too — covers the case where a
constructor's initial value was stored as a raw URI on a property whose
shape declares `resolveLanguage: literal` (enum-like state defaults).

Values are validated before injection: strings are
`NON_ALPHANUMERIC`-percent-encoded (matching `literal_encode`), numbers
must be finite, booleans must be `true|false`. Non-finite numeric
filters short-circuit to `FILTER(false)`.

Refs: research/literal-encoding-and-sparql-pushdown-2026-06-03.md (V4)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirror the V4 transformation in the projection where-clause builder
(`build_projection_where_patterns`). Equality on a target-shape property
declared `resolveLanguage: literal` now emits a direct IRI match against
`<literal:string:X>` / `<literal:number:N>` / `<literal:boolean:b>`
instead of `STR(fn/parse_literal(?v)) = "X"`.

The pred_lookup now carries `(predicate, is_literal_prop)` so the
projection layer can distinguish properties whose targets are stored as
literal IRIs (use POS-index probe) from those backed by raw URIs or
unknown storage (keep the fn/parse_literal fallback). String/StringArray
inputs that themselves look like absolute IRIs also include the raw-IRI
form via VALUES, matching V4's UNION fallback semantics.

Refs: research/literal-encoding-and-sparql-pushdown-2026-06-03.md (V5)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s (T4)

Restructure the WHERE-clause regression tests around the Phase 1/2/3
storage contract:

- Rename helper `signed_envelope_literal` -> `legacy_envelope_literal`.
  The envelope form models pre-migration data shape only; it is not the
  current write path.
- `test_signed_envelope_where_paginate_count` ->
  `test_legacy_envelope_migrated_then_paginate_count`. Now seeds
  envelope-form data, runs `migrate_signed_envelopes_to_plain_literals`,
  then asserts the V4 direct-IRI WHERE probe finds the migrated rows.
- `test_mixed_plain_and_signed_envelope_where` ->
  `test_legacy_mixed_migrated_then_contains`. Same migration-then-query
  pattern; mixes pre-existing plain literals with envelope rows to verify
  the migration is idempotent on plain data.
- Add `test_plain_literal_where_paginate_count` — parallel coverage
  using plain literals from the start (no migration call). Asserts the
  V4 POS-index path returns identical results to the legacy migrated
  test.
- Add `test_plain_literal_contains_works_on_fn_parse_literal_path` —
  exercises `WhereOps::contains`, which still routes through
  `fn/parse_literal`, against plain `literal:string:X` storage. Confirms
  the fallback path still produces correct substring semantics.

`test_resolve_projections_where_filter_via_target_shape_property` is
unchanged — it already stored plain `literal:string:like_type_id123`
and continues to pass because V5 emits a direct IRI probe matching the
stored form.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR changes literal decoding to return primitives directly, adds helpers to encode deterministic literal:* IRIs, implements an idempotent migration to convert signed-envelope literal:json: targets to typed plain literals, updates SPARQL WHERE generation to match literal properties against deterministic literal IRIs, and expands test and benchmark coverage.

Changes

Literal value encoding and migration refactor

Layer / File(s) Summary
Core literal decoding behavior
rust-executor/src/languages/literal.rs, core/src/perspectives/SparqlBindings.ts
literal_decode returns primitive JSON values directly; parseLit() now stringifies decoded objects unconditionally instead of preferring a .data field. Tests updated for direct equality on primitives and subscription wait timeout increased.
Literal encoding and formatting utilities
rust-executor/src/perspectives/model_query/utils.rs
literal_percent_encode promoted to shared scope; format_literal_number added for finite f64 formatting into literal:number: tails; looks_like_absolute_iri added to heuristically detect absolute IRIs; validate_iri tightened and docs updated.
Property resolution and write-path
rust-executor/src/perspectives/perspective_instance.rs, core/src/Literal.ts, core/src/model/Ad4mModel.ts, core/src/model/query-sparql.ts
resolve_property_value encodes JSON values into deterministic literal:* IRIs for properties with resolve_language == "literal", returning early and bypassing expression creation; Literal.get() gains boolean: support and Ad4mModel write-path uses valueToLiteralIri.
Signed-envelope to plain-literal migration
rust-executor/src/perspectives/sparql_store.rs
New migrate_signed_envelopes_to_plain_literals() scans reifier-backed links with literal:json:<signed_envelope> targets, decodes/unwraps envelopes, writes typed plain-literal targets, rebuilds reifier IRIs, copies metadata, removes old metadata/triples when safe, and sets migration version to 3.
Migration invocation during initialization
rust-executor/src/perspectives/mod.rs
Perspective initialization now idempotently invokes the migration on each store during initialize_from_db, logging converted counts and returning early on migration failure.
SPARQL generation for literal-property filtering
rust-executor/src/perspectives/model_query/projection.rs, rust-executor/src/perspectives/model_query/sparql_builder.rs
Query builders now carry an is_literal_prop flag per predicate and emit deterministic literal:* IRI matches (VALUES/triple patterns; UNION when inputs look like absolute IRIs) for literal properties; numeric formatting failures emit FILTER(false) fallbacks; non-literal props continue using parse_literal-based FILTERs.
Test coverage for migration and literal behavior
rust-executor/src/perspectives/model_query/integration_tests.rs, tests/js/tests/prolog-and-literals.test.ts, tests/js/tests/mcp-http.test.ts
Integration tests added/adjusted for legacy-envelope migration, mixed plain/envelope scenarios, plain-literal WHERE/pagination/count and contains semantics, a release-only benchmark comparing indexed IRI vs parse_literal FILTER, and JS tests updated to assert direct decoded values and deterministic literal:* targets.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • coasys/ad4m#695: Touches Ad4mModel.setProperty handling related to literal serialization and resolve-language semantics.
  • coasys/ad4m#831: Directly related — modifies parseLit() decoding/object-handling semantics in core/src/perspectives/SparqlBindings.ts.

Suggested reviewers

  • lucksus

"🐇 I peeked inside envelopes tight,
plain literals sprung into the light,
IRIs encoded, tests align,
migration hummed down the line,
queries skip the extra bite."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: deterministic literal property targets + indexed WHERE filters' accurately summarizes the main change: transitioning from signed-envelope expressions to deterministic literal IRIs for property storage and optimizing WHERE filters with direct IRI matching.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/literal-channel-v-separation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

HexaField added 3 commits June 4, 2026 00:24
The previous pass left a number of comments referencing internal phase
labels (V1/V4/V5), a 'Channel V/E' design vocabulary, and a 2026-06-03
research doc path. Replace those with comments that describe what the
code does and why a reader can't infer it from the code itself.
Compares two equivalent SPARQL queries against the same data — the
indexed direct-IRI probe that the WHERE builders now emit, and the
fn/parse_literal-wrapped FILTER they used to emit. Gated to release
builds via cfg!(debug_assertions); scale tunable via WT_BENCH_LINKS.

Indexed stays flat at ~12-40us across 1k-50k links (POS index probe).
Filter scales linearly because every row materialises through the
custom function.
@HexaField HexaField changed the title refactor: Channel V / Channel E literal-encoding separation refactor: deterministic literal property targets + indexed WHERE filters Jun 4, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
rust-executor/src/perspectives/model_query/sparql_builder.rs (1)

503-523: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The is_literal_prop check is too broad in the main WHERE builder too.

This path now treats any property with a resolve_language as if it were stored as a deterministic literal:* target. Only resolve_language == "literal" has that storage contract; other resolvers will now turn equality filters into false negatives. Gate the fast-path on Some("literal") here as well.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust-executor/src/perspectives/model_query/sparql_builder.rs` around lines
503 - 523, The current fast-path uses is_literal_prop =
prop.resolve_language.is_some(), which wrongly treats any resolver as the
deterministic `literal:*` storage; change the check in the WHERE builder (where
is_literal_prop is set and used inside the WhereCondition::String match) to only
true when resolve_language equals "literal" (e.g. compare prop.resolve_language
to Some("literal") or use as_deref() == Some("literal")), so the
literal-percent-encode/IRI UNION branch only runs for actual literal resolvers.
rust-executor/src/perspectives/sparql_store.rs (1)

85-96: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't unwrap arbitrary JSON objects that merely have a data key.

This now collapses every literal:json: payload with a top-level data field to the inner value. Genuine JSON like {"data":"x","kind":"message"} will compare as "x" instead of the stored object, so fn/parse_literal starts returning wrong matches. Gate the unwrap on the full signed-envelope shape before stripping to .data.

Based on learnings, the .data extraction pattern is only intentional in contexts that explicitly want to reduce Expression objects to payload values; it is not a universal literal-decoding rule.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rust-executor/src/perspectives/sparql_store.rs` around lines 85 - 96, The
code currently strips any JSON object with a top-level "data" key which
collapses legitimate literal:json values; restrict the unwrap to only
signed-expression envelopes by verifying the full envelope shape before
extracting `.data`: parse `decoded` into `json_val` as you already do, then
check that `json_val` is an object and contains the specific envelope keys (e.g.
"author", "timestamp", "proof") in addition to "data" before building `data_str`
and returning `Literal::new_simple_literal(&data_str).into()`; if those envelope
keys are not present, fall through and keep the original `decoded` handling (do
not unwrap based only on presence of `data`).

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rust-executor/src/perspectives/mod.rs`:
- Around line 122-139: The migration failure is currently only logged and
initialization continues, which leaves the perspective in an inconsistent state;
update the logic in the init path that calls
p.sparql_store.migrate_signed_envelopes_to_plain_literals() (referenced here and
by handle_clone.uuid) to not proceed on Err: either propagate/return the error
to fail perspective initialization (so startup stops) or set a conservative flag
on the perspective to force the non-indexed WHERE/equality query path until
migration succeeds; implement one of these two behaviors and ensure the Err(e)
branch calls the chosen failure path instead of merely logging a warning.

In `@rust-executor/src/perspectives/model_query/integration_tests.rs`:
- Around line 4464-4477: The benchmark currently only checks that the JSON
result contains "test://row/{needle_idx}" which allows overmatching; after each
timed call to store.query(&indexed) and store.query(&filtered) (variables
indexed_total/filtered_total and function store.query), parse the returned JSON
string r into the expected response structure and assert that the response
contains exactly one item and that its "source" equals
format!("test://row/{needle_idx}"); perform these strict assertions immediately
after measuring and before accumulating timing to ensure the benchmark fails on
extra matches.

In `@rust-executor/src/perspectives/model_query/projection.rs`:
- Around line 299-305: The code currently marks any resolved property as a
literal by using p.resolve_language.is_some(); change this to only treat it as
literal when the resolver equals the literal resolver string (e.g. compare
p.resolve_language to Some("literal") or use as_deref() == Some("literal")).
Update the map insertion in the loop over target_shape.properties (where you
build map.insert(p.name.clone(), (p.predicate.clone(), ...))) so the second
tuple element is true only for an exact "literal" resolver value rather than any
Some(_), keeping the rest of the projection logic unchanged.

In `@rust-executor/src/perspectives/model_query/utils.rs`:
- Around line 51-67: The function looks_like_absolute_iri currently trusts
validate_iri but must first reject strings containing control or whitespace
characters to avoid emitting malformed <...> IRIREFs; update
looks_like_absolute_iri to early-return false if s contains any control or
whitespace (e.g. via s.chars().any(|c| c.is_control() || c.is_whitespace()))
before calling validate_iri and the existing colon/first-byte checks so values
with \n, \r, \t or other control/whitespace are not treated as absolute IRIs.

In `@rust-executor/src/perspectives/sparql_store.rs`:
- Around line 1221-1236: The migration builds new_target from data using ad-hoc
formatting (the match on serde_json::Value producing
literal:string|number|boolean|json), causing non-deterministic IRIs; replace
that match with a call to the project’s canonical literal encoder used by new
writes and equality probes (instead of manual utf8_percent_encode/format logic)
so new_target is produced by the same helper given data (serde_json::Value) and
preserves identical "literal:..." IRIs; ensure you wire the helper's return into
new_target and preserve any error/fallback behavior the helper provides.

---

Outside diff comments:
In `@rust-executor/src/perspectives/model_query/sparql_builder.rs`:
- Around line 503-523: The current fast-path uses is_literal_prop =
prop.resolve_language.is_some(), which wrongly treats any resolver as the
deterministic `literal:*` storage; change the check in the WHERE builder (where
is_literal_prop is set and used inside the WhereCondition::String match) to only
true when resolve_language equals "literal" (e.g. compare prop.resolve_language
to Some("literal") or use as_deref() == Some("literal")), so the
literal-percent-encode/IRI UNION branch only runs for actual literal resolvers.

In `@rust-executor/src/perspectives/sparql_store.rs`:
- Around line 85-96: The code currently strips any JSON object with a top-level
"data" key which collapses legitimate literal:json values; restrict the unwrap
to only signed-expression envelopes by verifying the full envelope shape before
extracting `.data`: parse `decoded` into `json_val` as you already do, then
check that `json_val` is an object and contains the specific envelope keys (e.g.
"author", "timestamp", "proof") in addition to "data" before building `data_str`
and returning `Literal::new_simple_literal(&data_str).into()`; if those envelope
keys are not present, fall through and keep the original `decoded` handling (do
not unwrap based only on presence of `data`).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59873fa3-4060-4719-b8a2-a39d959ca255

📥 Commits

Reviewing files that changed from the base of the PR and between b2920ec and 08240a0.

📒 Files selected for processing (10)
  • core/src/perspectives/SparqlBindings.ts
  • rust-executor/src/languages/literal.rs
  • rust-executor/src/perspectives/mod.rs
  • rust-executor/src/perspectives/model_query/integration_tests.rs
  • rust-executor/src/perspectives/model_query/projection.rs
  • rust-executor/src/perspectives/model_query/sparql_builder.rs
  • rust-executor/src/perspectives/model_query/utils.rs
  • rust-executor/src/perspectives/perspective_instance.rs
  • rust-executor/src/perspectives/sparql_store.rs
  • tests/js/tests/prolog-and-literals.test.ts

Comment thread rust-executor/src/perspectives/mod.rs Outdated
Comment thread rust-executor/src/perspectives/model_query/integration_tests.rs Outdated
Comment thread rust-executor/src/perspectives/model_query/projection.rs
Comment thread rust-executor/src/perspectives/model_query/utils.rs
Comment thread rust-executor/src/perspectives/sparql_store.rs Outdated
HexaField and others added 5 commits June 8, 2026 21:51
Force CircleCI to re-evaluate; previous webhook delivery failed with
`why: github, start_time: None` (no runner accepted the job).  This
is a no-op commit on the branch tip purely to nudge the queue.
Five comments from CodeRabbit on the v3 literal-target refactor:

* mod.rs (Major) — `migrate_signed_envelopes_to_plain_literals` now
  fail-stops perspective init on error. Previously the migration error
  was only warn-logged, so the perspective came up serving a mix of
  legacy `literal:json:<envelope>` and migrated `literal:*` targets, and
  the new indexed WHERE filters silently missed the unmigrated rows.
  The error path now `log::error!`s and returns from the init spawn;
  the next executor restart can retry cleanly.

* projection.rs:304 + sparql_builder.rs:512 (Major) — tighten the
  literal fast-path detection from `resolve_language.is_some()` to
  `resolve_language.as_deref() == Some("literal")`. Only the built-in
  `"literal"` resolver produces deterministic `literal:*` targets; any
  other resolver wraps values in author-signed expression IRIs and the
  former check would have emitted encoded-literal probes against them,
  silently dropping matches.

* sparql_store.rs:1221 (Major) — the v3 migration's hand-rolled per-
  variant `format!("literal:{kind}:{...}", …)` block is gone. The
  migration now reuses the canonical `crate::languages::literal_encode`
  helper that fresh writes flow through, so migrated rows and new writes
  produce identical target IRIs. Without this, an integer-shaped float
  like `1.0` landed as `literal:number:1.0` from the migration while
  fresh writes / WHERE probes produced `literal:number:1` (via
  `format_literal_number`'s integer collapse), causing migrated rows
  to silently miss exact-match filters.

* utils.rs:67 (Minor) — `validate_iri` now rejects any control or
  whitespace character (not just ASCII space), so `\n`/`\r`/`\t` and
  Unicode whitespace can no longer slip past the heuristic and emit
  malformed `<…>` IRIREFs in generated SPARQL.

* integration_tests.rs:4480 (Minor) — the
  `bench_indexed_iri_vs_fn_parse_literal_filter` benchmark now parses
  each response and asserts exactly one row with `source ==
  test://row/{needle_idx}`. The previous `contains` check would have
  passed silently if either plan started overmatching.

Also drop the now-unused `utf8_percent_encode` / `NON_ALPHANUMERIC`
imports in the migration path.

JS test surface:
* mcp-http.test.ts §5b — flip the "resolveLanguage for boolean/string
  properties" assertions to match the new contract: properties with
  `resolveLanguage: "literal"` now produce deterministic
  `literal:boolean:` / `literal:string:` targets, NOT the legacy
  `literal:json:<signed envelope>` form. Provenance for the link as a
  whole lives on the RDF 1.2 reifier, so wrapping each property value
  in its own signed envelope was redundant and defeated indexed
  equality lookups.
…writes

`setProperty` was always routing values through `_perspective.createExpression`
when `resolveLanguage` was set, which for the built-in `"literal"` resolver
produced `literal:json:<signed envelope>` targets — defeating the indexed
WHERE path #837 puts in place and breaking round-trip assertions like
`expect(await todo.title).to.equal("new title")` (the read decodes the
envelope and surfaces the 4-key `{author, timestamp, data, proof}` object
instead of `"new title"`).

Mirror the Rust-side bypass already in `resolve_property_value`: when
`resolveLanguage === "literal"`, encode the value with `Literal.from(value)
.toUrl()` directly. Other resolvers still go through `createExpression` so
non-literal resolveLanguage semantics are unchanged.

Also fill in `Literal.get()`'s missing `boolean:` branch so callers that
read a deterministic `literal:boolean:` target don't throw "Can't parse
unknown literal" when consuming the new write format end-to-end.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/src/model/Ad4mModel.ts`:
- Around line 1094-1105: The literal resolver branch is serializing all strings
as literal:string:..., but queryToSPARQL leaves absolute-IRI-like strings raw,
causing mismatched equality checks; update the resolveLanguage === "literal"
branch to detect absolute IRIs (use the same IRI-detection logic as
queryToSPARQL's check) and, for strings that are absolute IRIs, leave them as
raw IRIs instead of wrapping with Literal.from(...).toUrl(); otherwise continue
using Literal.from(value).toUrl() and for non-string values fall back to
this._perspective.createExpression(value, resolveLanguage).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7444555a-171b-4876-bcf7-3deea0dea240

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff5fbf and 561aa38.

📒 Files selected for processing (2)
  • core/src/Literal.ts
  • core/src/model/Ad4mModel.ts

Comment thread core/src/model/Ad4mModel.ts
…query

CodeRabbit caught a divergence I introduced in the previous fixup: the
TS-side `setProperty` bypass wrapped *every* string with
`Literal.from(value).toUrl()`, so writing `"https://example.com"` to a
`resolveLanguage: "literal"` property landed as
`<literal:string:https%3A...>` while `queryToSPARQL()` still emitted
`<https://example.com>` for the same filter — exact-match WHERE clauses
silently missed freshly written values.

Align the three serialization paths so they all keep URI-shaped strings
raw and only wrap primitives:

* `core/src/model/query-sparql.ts` — export `valueToLiteralIri` so it's
  the single source of truth for write/read symmetry.
* `core/src/model/Ad4mModel.ts` `setProperty` — route the literal
  bypass through `valueToLiteralIri` instead of calling
  `Literal.from(...).toUrl()` directly.
* `rust-executor/src/perspectives/perspective_instance.rs`
  `resolve_property_value` — same logic for the `resolve_language ==
  "literal"` short-circuit (strings with a URI scheme return as-is,
  everything else flows through `literal_encode`), matching the
  no-resolveLanguage branch immediately below.

`looksLikeUri` / `URI_SCHEME_RE` use the identical
`^[a-zA-Z][a-zA-Z0-9+\-._]*:` pattern on both sides so the write and
query produce byte-identical IRIs for any given value.
The "should demonstrate subscription performance" test polled every 10ms
with a hard 5s ceiling. That's plenty for steady-state subscription
latency (sub-second locally) but CI workers regularly take longer to
warm up the first subscription's GraphQL → executor → SPARQL → subscriber
round-trip after a fresh `save()`, leading to spurious failures on
otherwise-green runs.

Match the 30s waitForCondition ceiling the surrounding tests already use
(see `a5ad343ce test: bump waitForCondition timeouts to 60s for CI
stability` for the precedent). Steady-state latency is still logged via
`subscriptionLatency`, so a real regression to >30s would still show up
as a hang then a failure, but flake from a slow first round-trip on a
loaded CI worker won't fail the suite anymore.
lucksus and others added 2 commits June 11, 2026 11:08
The earlier 5s → 30s bump cleared the flake when this branch was at
121e308, but the recent merge from dev (which pulled in #848's
lazy-load resolveLanguage change) added enough first-fetch latency on a
freshly registered SDNA class that integration-tests-js #17171 hit the
30s ceiling on the very same "should demonstrate subscription
performance" test.

Match the 60s waitForCondition ceiling that's already used by every
other subscription test in this suite. The CI flake on #837 is the
only blocker keeping the stack from going green; steady-state latency
is still logged via `subscriptionLatency`, so a real regression past
60s would still surface as a slow log line rather than be hidden by
the bumped ceiling.
@HexaField HexaField marked this pull request as draft June 11, 2026 09:58
@HexaField HexaField marked this pull request as ready for review June 12, 2026 02:45
HexaField added a commit that referenced this pull request Jun 16, 2026
Five comments from CodeRabbit on the v3 literal-target refactor:

* mod.rs (Major) — `migrate_signed_envelopes_to_plain_literals` now
  fail-stops perspective init on error. Previously the migration error
  was only warn-logged, so the perspective came up serving a mix of
  legacy `literal:json:<envelope>` and migrated `literal:*` targets, and
  the new indexed WHERE filters silently missed the unmigrated rows.
  The error path now `log::error!`s and returns from the init spawn;
  the next executor restart can retry cleanly.

* projection.rs:304 + sparql_builder.rs:512 (Major) — tighten the
  literal fast-path detection from `resolve_language.is_some()` to
  `resolve_language.as_deref() == Some("literal")`. Only the built-in
  `"literal"` resolver produces deterministic `literal:*` targets; any
  other resolver wraps values in author-signed expression IRIs and the
  former check would have emitted encoded-literal probes against them,
  silently dropping matches.

* sparql_store.rs:1221 (Major) — the v3 migration's hand-rolled per-
  variant `format!("literal:{kind}:{...}", …)` block is gone. The
  migration now reuses the canonical `crate::languages::literal_encode`
  helper that fresh writes flow through, so migrated rows and new writes
  produce identical target IRIs. Without this, an integer-shaped float
  like `1.0` landed as `literal:number:1.0` from the migration while
  fresh writes / WHERE probes produced `literal:number:1` (via
  `format_literal_number`'s integer collapse), causing migrated rows
  to silently miss exact-match filters.

* utils.rs:67 (Minor) — `validate_iri` now rejects any control or
  whitespace character (not just ASCII space), so `\n`/`\r`/`\t` and
  Unicode whitespace can no longer slip past the heuristic and emit
  malformed `<…>` IRIREFs in generated SPARQL.

* integration_tests.rs:4480 (Minor) — the
  `bench_indexed_iri_vs_fn_parse_literal_filter` benchmark now parses
  each response and asserts exactly one row with `source ==
  test://row/{needle_idx}`. The previous `contains` check would have
  passed silently if either plan started overmatching.

Also drop the now-unused `utf8_percent_encode` / `NON_ALPHANUMERIC`
imports in the migration path.

JS test surface:
* mcp-http.test.ts §5b — flip the "resolveLanguage for boolean/string
  properties" assertions to match the new contract: properties with
  `resolveLanguage: "literal"` now produce deterministic
  `literal:boolean:` / `literal:string:` targets, NOT the legacy
  `literal:json:<signed envelope>` form. Provenance for the link as a
  whole lives on the RDF 1.2 reifier, so wrapping each property value
  in its own signed envelope was redundant and defeated indexed
  equality lookups.
HexaField added a commit that referenced this pull request Jun 16, 2026
…writes

`setProperty` was always routing values through `_perspective.createExpression`
when `resolveLanguage` was set, which for the built-in `"literal"` resolver
produced `literal:json:<signed envelope>` targets — defeating the indexed
WHERE path #837 puts in place and breaking round-trip assertions like
`expect(await todo.title).to.equal("new title")` (the read decodes the
envelope and surfaces the 4-key `{author, timestamp, data, proof}` object
instead of `"new title"`).

Mirror the Rust-side bypass already in `resolve_property_value`: when
`resolveLanguage === "literal"`, encode the value with `Literal.from(value)
.toUrl()` directly. Other resolvers still go through `createExpression` so
non-literal resolveLanguage semantics are unchanged.

Also fill in `Literal.get()`'s missing `boolean:` branch so callers that
read a deterministic `literal:boolean:` target don't throw "Can't parse
unknown literal" when consuming the new write format end-to-end.
HexaField added a commit that referenced this pull request Jun 16, 2026
The earlier 5s → 30s bump cleared the flake when this branch was at
121e308, but the recent merge from dev (which pulled in #848's
lazy-load resolveLanguage change) added enough first-fetch latency on a
freshly registered SDNA class that integration-tests-js #17171 hit the
30s ceiling on the very same "should demonstrate subscription
performance" test.

Match the 60s waitForCondition ceiling that's already used by every
other subscription test in this suite. The CI flake on #837 is the
only blocker keeping the stack from going green; steady-state latency
is still logged via `subscriptionLatency`, so a real regression past
60s would still surface as a slow log line rather than be hidden by
the bumped ceiling.
@HexaField

Copy link
Copy Markdown
Contributor Author

Scope absorbed into #842 — resolveLiteral flag replaces the literal-channel separation approach.

@HexaField HexaField closed this Jun 16, 2026
HexaField added a commit that referenced this pull request Jun 18, 2026
Five comments from CodeRabbit on the v3 literal-target refactor:

* mod.rs (Major) — `migrate_signed_envelopes_to_plain_literals` now
  fail-stops perspective init on error. Previously the migration error
  was only warn-logged, so the perspective came up serving a mix of
  legacy `literal:json:<envelope>` and migrated `literal:*` targets, and
  the new indexed WHERE filters silently missed the unmigrated rows.
  The error path now `log::error!`s and returns from the init spawn;
  the next executor restart can retry cleanly.

* projection.rs:304 + sparql_builder.rs:512 (Major) — tighten the
  literal fast-path detection from `resolve_language.is_some()` to
  `resolve_language.as_deref() == Some("literal")`. Only the built-in
  `"literal"` resolver produces deterministic `literal:*` targets; any
  other resolver wraps values in author-signed expression IRIs and the
  former check would have emitted encoded-literal probes against them,
  silently dropping matches.

* sparql_store.rs:1221 (Major) — the v3 migration's hand-rolled per-
  variant `format!("literal:{kind}:{...}", …)` block is gone. The
  migration now reuses the canonical `crate::languages::literal_encode`
  helper that fresh writes flow through, so migrated rows and new writes
  produce identical target IRIs. Without this, an integer-shaped float
  like `1.0` landed as `literal:number:1.0` from the migration while
  fresh writes / WHERE probes produced `literal:number:1` (via
  `format_literal_number`'s integer collapse), causing migrated rows
  to silently miss exact-match filters.

* utils.rs:67 (Minor) — `validate_iri` now rejects any control or
  whitespace character (not just ASCII space), so `\n`/`\r`/`\t` and
  Unicode whitespace can no longer slip past the heuristic and emit
  malformed `<…>` IRIREFs in generated SPARQL.

* integration_tests.rs:4480 (Minor) — the
  `bench_indexed_iri_vs_fn_parse_literal_filter` benchmark now parses
  each response and asserts exactly one row with `source ==
  test://row/{needle_idx}`. The previous `contains` check would have
  passed silently if either plan started overmatching.

Also drop the now-unused `utf8_percent_encode` / `NON_ALPHANUMERIC`
imports in the migration path.

JS test surface:
* mcp-http.test.ts §5b — flip the "resolveLanguage for boolean/string
  properties" assertions to match the new contract: properties with
  `resolveLanguage: "literal"` now produce deterministic
  `literal:boolean:` / `literal:string:` targets, NOT the legacy
  `literal:json:<signed envelope>` form. Provenance for the link as a
  whole lives on the RDF 1.2 reifier, so wrapping each property value
  in its own signed envelope was redundant and defeated indexed
  equality lookups.
HexaField added a commit that referenced this pull request Jun 18, 2026
…writes

`setProperty` was always routing values through `_perspective.createExpression`
when `resolveLanguage` was set, which for the built-in `"literal"` resolver
produced `literal:json:<signed envelope>` targets — defeating the indexed
WHERE path #837 puts in place and breaking round-trip assertions like
`expect(await todo.title).to.equal("new title")` (the read decodes the
envelope and surfaces the 4-key `{author, timestamp, data, proof}` object
instead of `"new title"`).

Mirror the Rust-side bypass already in `resolve_property_value`: when
`resolveLanguage === "literal"`, encode the value with `Literal.from(value)
.toUrl()` directly. Other resolvers still go through `createExpression` so
non-literal resolveLanguage semantics are unchanged.

Also fill in `Literal.get()`'s missing `boolean:` branch so callers that
read a deterministic `literal:boolean:` target don't throw "Can't parse
unknown literal" when consuming the new write format end-to-end.
HexaField added a commit that referenced this pull request Jun 18, 2026
The earlier 5s → 30s bump cleared the flake when this branch was at
121e308, but the recent merge from dev (which pulled in #848's
lazy-load resolveLanguage change) added enough first-fetch latency on a
freshly registered SDNA class that integration-tests-js #17171 hit the
30s ceiling on the very same "should demonstrate subscription
performance" test.

Match the 60s waitForCondition ceiling that's already used by every
other subscription test in this suite. The CI flake on #837 is the
only blocker keeping the stack from going green; steady-state latency
is still logged via `subscriptionLatency`, so a real regression past
60s would still surface as a slow log line rather than be hidden by
the bumped ceiling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants