Skip to content

Performance hot paths plus attribute set semantics, entity involvements, and serializer ordering#16

Merged
amateescu merged 4 commits into
mainfrom
performance-and-api-cleanup
Jun 15, 2026
Merged

Performance hot paths plus attribute set semantics, entity involvements, and serializer ordering#16
amateescu merged 4 commits into
mainfrom
performance-and-api-cleanup

Conversation

@amateescu

Copy link
Copy Markdown
Owner

Changes

  • Speed up the serialization and namespace-pruning hot paths: a fast path in QualifiedNameEscaper::escape(), a lookup cache on PrefixMinter::prefixFor(), lazy blank-label collection in JsonSerializer, qualified-name string-form reuse, a decode guard on JSON deserialization, and namespace-URI-based pruning in the builders.
  • Adopt set semantics for attribute values: Attributes now collapses identical attribute-value pairs in its constructor (PROV-DM models a record's attributes as a set of pairs), guarded so single-valued keys pay nothing. Value identity moves to the new internal ValueIdentity, shared with DocumentComparator so dedup and equality always agree, including the bare-scalar vs typed-Literal cross-type case. Observable only as count() dropping for inputs that carried duplicate pairs.
  • Remove the unused Attributes::merge(): it had no production callers, and the multimap is built via with(), from(), and AttributesBuilder.
  • Add RecordContainer::entityInvolvements(), a forward pass yielding a public EntityInvolvement (relationType, role, entity) per entity-typed relation endpoint, so a consumer can derive a secondary index from a finished Document or Bundle. A typed derivation reports its subtype (wasRevisionOf etc.), and role distinguishes endpoints such as generatedEntity from usedEntity.
  • Add the opt-in RecordBuilder::autoDeclareEntities(): at build() it emits a bare Entity for any entity-typed relation endpoint no record declares, skipping blank nodes and already-declared identifiers and propagating to bundles. Off by default.
  • Always sort namespace declarations in every serializer (the prov and xsd built-ins first, then alphabetically by prefix), and add an opt-in sortRecords flag that orders records into PROV-DM concept order (elements, then relations in component order, then by identifier). Ordering is non-semantic in every PROV format, so round-trips and semantic equality are unaffected.

@amateescu amateescu merged commit 6d3ccd9 into main Jun 15, 2026
7 checks passed
@amateescu amateescu deleted the performance-and-api-cleanup branch June 15, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant