Skip to content

[PRD]: DPROD Catalog Publication (Outbound DCAT-AP / DPROD) #550

Description

@larsgeorge-db

Sibling PRD to #493 (External Marketplace Providers — inbound DCAT-AP). This PRD covers the outbound half: Ontos publishing its own products as a DCAT-AP / DPROD catalog so peer Ontos workspaces, external DCAT-AP catalogs, and AI/MCP agents can subscribe to it.# PRD: DPROD Catalog Publication (Outbound DCAT-AP / DPROD)

Problem Statement

Ontos already speaks the right vocabulary internally — ontos-ontology.ttl defines DataProduct, OutputPort, DataContract, DataDomain, Ownership — and externally on the way in: the External Marketplace Providers PRD (#493) commits Ontos to consuming DCAT-AP Turtle catalogs from third parties (Vibe Business, S&P, Snowflake Marketplace, internal data exchanges), caching them in the existing rdf_triples store under named graphs, and importing listings as linked copies.

What Ontos cannot do today is the mirror image: publish its own products as a DCAT-AP / DPROD catalog that another Ontos instance, an external open-data portal, or a cross-org AI agent can subscribe to. Every release on every workspace today is invisible to anything outside its own database. The consequences are concrete:

  • Federation is one-way. A customer with two Ontos workspaces (e.g. one per region, one per business unit) cannot have workspace B discover workspace A's products. [PRD]: External Marketplace Providers #493 builds the consumer half — there is no producer half to point it at.
  • AI/MCP agents cannot reason across instances. The MCP server exposes products from one workspace as proprietary JSON. A cross-org agent that needs "find me a Customer 360 product in any of my five Ontos instances with at least Silver maturity and Delta Sharing delivery" cannot run that query — the data is not linked, not standardised, and not addressable by IRI.
  • External catalogs cannot consume Ontos. EU Data Spaces, Gaia-X participants, national open-data portals, and large enterprises building their own DCAT-AP federations all expect a DCAT-AP TTL endpoint. Ontos can be on the receiving end of those (via [PRD]: External Marketplace Providers #493) but not on the publishing end.
  • The internal vocabulary is detached from the W3C/OMG standards it shadows. ontos:DataProduct looks like dprod:DataProduct and quacks like dcat:DatasetSeries, but no formal subclass relationship is asserted. Any downstream consumer that resolves ontos:DataProduct against the wider linked-data graph gets a dead end. The codebase already does the right thing for ODCS via owl:imports odcs: — DPROD, DCAT, PROV, and dcterms: get no such treatment.
  • Round-trips lose information. A product exported as ODPS YAML by Ontos A and imported by Ontos B via [PRD]: External Marketplace Providers #493 goes through application/vnd.odps+json cleanly, but every other dimension (ownership graph, lineage edges, domain taxonomy, support channels, custom properties) survives only as opaque blobs. If Ontos's published catalog were DPROD-native, the imported product would land in Ontos B with all of those dimensions intact and machine-queryable.

The asymmetry leaves a hole in the Data Mesh story: customers can buy into the mesh, but they cannot be the mesh. Without an outbound DPROD endpoint, every Ontos deployment is a leaf node in someone else's federation, never a peer.

Solution

Add a single, authenticated workspace-wide DPROD/DCAT-AP catalog endpoint to Ontos and align the internal ontology with DPROD / DCAT / PROV / dcterms: so the published catalog reflects the same vocabulary the rest of the linked-data world uses. The new endpoint is the symmetric mirror of #493's consumer: where #493 polls third-party DCAT-AP catalog URLs and caches them in rdf_triples under urn:provider:<id>:catalog, the new publisher endpoint generates the workspace's own DCAT-AP catalog into rdf_triples under urn:ontos:self:catalog and serves it as Turtle.

A new opt-in publication_scope=external (distinct from the existing organization and marketplace scopes) controls per-item visibility. Producers tick a single "Publish to external catalog" toggle on a Data Product or Data Contract; the item is then included in the workspace's DCAT-AP catalog the next time the catalog is rebuilt. Existing internal marketplace and organization scopes are untouched — external publication is a strictly additional opt-in, never implicit.

The ontology realignment is deliberately light. ontos-ontology.ttl gains five new prefixes (dcat:, dprod:, prov:, dcterms:, foaf:) and a small set of rdfs:subClassOf / owl:equivalentProperty assertions on the existing ontos:DataProduct, ontos:OutputPort, ontos:DataContract, ontos:DataDomain, ontos:Ownership classes. No SHACL shapes in v1, no ODRL, no DQV — those are deferred. The version bumps from 2.0.0 to 2.1.0, signalling additive backward-compatible enrichment.

The exporter is a pair of pure functions sitting alongside the existing DataProductsManager.build_odps_export. A Catalog Builder takes a CatalogContext (workspace base IRI, calling principal, audience filter) and returns a ParsedCatalog dataclass whose shape deliberately mirrors the input shape that #493's DcatCatalogParser produces — same listings, same attached distributions, same ontosmkt: extension vocabulary for delivery method, audience hints, and bundle parents. A DPROD Serializer turns that ParsedCatalog into a Turtle string via rdflib. Both are pure, fixture-driven, and have no database I/O of their own.

Output ports are serialised as dcat:DataService with dcat:endpointURL, carrying the ODCS contract as an attached dcat:Distribution with dcat:mediaType "application/vnd.odcs+json" — exactly the convention #493 prefers on import. Data products are dprod:DataProduct (also typed dcat:DatasetSeries) with dcterms:publisher derived from the active Business Owners, dcat:theme from the Data Domain, and a prov:wasDerivedFrom edge per relationship in EntityRelationshipDb of lineage kind. Delivery details (delta sharing share name, recipient profile URL) reuse the ontosmkt: extension vocabulary defined in #493 so that an Ontos-to-Ontos round-trip via #493 preserves them losslessly.

The HTTP surface is one route: GET /.well-known/dcat-catalog.ttl. Authenticated via the same SSO as the rest of Ontos; the caller's principal feeds the audience filter so that audience-restricted items are excluded for users who cannot see them internally. Anonymous access is off in v1. The route reads the materialised catalog from the urn:ontos:self:catalog named graph; a small CatalogRebuilder invalidates and rematerialises the graph whenever an externally-scoped product/contract/domain/owner changes (transactional, triggered from existing manager hooks), and on a low-frequency background job for safety.

Two new MCP tools surface the catalog to agents: describe_product_as_dprod(product_id) returns the product's TTL fragment, and sparql_search_catalog(query) runs a read-only SPARQL SELECT/ASK against the urn:ontos:self:catalog graph, post-filtered by the caller's audience. No new authentication primitives — MCP tokens already gate access.

Phasing is two slices, each independently reviewable and mergeable:

  • P1 — Ontology + Catalog Builder + Serializer. Light ontology realignment in ontos-ontology.ttl. Pure CatalogBuilder and DprodSerializer modules with fixtures. No HTTP route, no UX, no MCP tools, no DB changes beyond the new publication_scope enum value. Shippable as a self-contained library change with strong unit tests, including a round-trip integration test that feeds the serializer output back through [PRD]: External Marketplace Providers #493's DcatCatalogParser and asserts losslessness against a shared fixture set.
  • P2 — Endpoint, opt-in UX, MCP, materialisation. Add the /.well-known/dcat-catalog.ttl route, the per-product / per-contract opt-in toggle, the CatalogRebuilder materialiser, and the two MCP tools. UI is two changes: a toggle on the Data Product / Data Contract publish dialog, and a read-only "External Catalog Status" admin panel showing last rebuild time, item count, and per-domain breakdown.

User Stories

Ontology realignment

  1. As a data architect, I want ontos:DataProduct declared as rdfs:subClassOf dprod:DataProduct, dcat:DatasetSeries, so that a third-party SPARQL consumer querying for dcat:DatasetSeries discovers our products.
  2. As a data architect, I want ontos:OutputPort declared as rdfs:subClassOf dprod:OutputPort, dcat:DataService, so that downstream catalogs treat output ports as first-class queryable endpoints.
  3. As a data architect, I want ontos:DataContract declared as rdfs:subClassOf dprod:DataContract, so that the contract is addressable as a standards-typed artifact, not a vendor-specific blob.
  4. As a data architect, I want ontos:DataDomain declared as rdfs:subClassOf dcat:Catalog, so that domain-scoped sub-catalogs (a v2 affordance) come for free.
  5. As a data architect, I want ontos:Ownership enriched with dcterms:publisher / dcterms:creator / prov:wasAttributedTo equivalent properties, so that owners published in the catalog show up as foaf:Agent references rather than opaque strings.
  6. As a data architect, I want the ontology version bumped from 2.0.0 to 2.1.0 and the dc:modified date updated, so that consumers can detect the realignment via version metadata.
  7. As a data architect, I want the new prefixes (dcat:, dprod:, prov:, dcterms:, foaf:) added once at the top of the ontology file, so that all later assertions read cleanly.
  8. As a data architect, I want no SHACL shapes in v1, so that the realignment is purely additive and cannot break any existing consumer that loads the ontology.
  9. As an open-source maintainer, I want the realigned ontology rendered the same way owl:imports odcs: is rendered today, so that the pattern is consistent and obvious to future contributors.

Per-item opt-in (publication_scope=external)

  1. As a Data Producer, I want a new external value in the publication_scope enum for Data Products and Data Contracts, so that I can mark items for external catalog visibility without changing their internal marketplace status.
  2. As a Data Producer, I want a single "Publish to external catalog" toggle in the Data Product publish dialog, so that opting in is a one-click decision.
  3. As a Data Producer, I want the same toggle in the Data Contract publish dialog, so that contracts can be exposed externally without their parent product (e.g. a public schema spec that has no implementing product yet).
  4. As a Data Producer, I want toggling the switch on a draft / under-review product to be blocked with an inline message, so that I cannot accidentally publish externally before the item is internally released.
  5. As a Data Producer, I want toggling the switch off to immediately remove the item from the external catalog on the next rebuild, so that retraction is fast and unambiguous.
  6. As a Data Consumer (internal), I do not want to see the external scope toggle on items I cannot edit, so that the UX is identical to the existing marketplace toggle.
  7. As an Ontos admin, I want every change to publication_scope=external audit-logged with actor, before, and after, so that I have a forensic trail of what was exposed externally and when.
  8. As a Domain Steward, I want a settings switch on a Data Domain that defaults all child items to external=true on publish (off by default), so that a domain can opt in en masse without per-product clicks.
  9. As an Ontos admin, I want a settings-external-catalog permission that gates the admin panel and the rebuild controls, so that producer-level "publish to external" cannot be conflated with admin-level catalog configuration.

Catalog endpoint

  1. As a peer Ontos workspace (acting via [PRD]: External Marketplace Providers #493), I want to fetch /.well-known/dcat-catalog.ttl and receive a valid DCAT-AP TTL document, so that I can register the publishing workspace as an External Marketplace Provider with no bespoke code.
  2. As an external DCAT-AP catalog crawler, I want the endpoint to honour standard HTTP caching headers (ETag, Last-Modified), so that I do not refetch unchanged catalogs.
  3. As an external DCAT-AP catalog crawler, I want the endpoint to set Content-Type: text/turtle; charset=utf-8 and the appropriate Vary header, so that downstream parsers behave correctly.
  4. As an Ontos admin, I want unauthenticated requests to the catalog endpoint to return 401 with a WWW-Authenticate hint, so that the contract is "auth-only" and unambiguous in v1.
  5. As an authenticated caller, I want the catalog filtered by my audience tokens (the same filter applied to in-product visibility), so that I never see items I could not see inside Ontos itself.
  6. As an authenticated caller with no externally-published items in my audience, I want a valid empty dcat:Catalog (not a 404), so that downstream tooling can subscribe before any items are opted in.
  7. As an Ontos admin, I want each catalog response to carry the workspace's stable base IRI (http://<workspace>/) in every subject, so that consuming catalogs deduplicate listings correctly across refreshes.
  8. As an Ontos admin, I want the workspace base IRI configurable in Settings (defaulting to the Databricks workspace URL), so that publishing under a vanity domain is possible without code changes.
  9. As an Ontos admin, I want the catalog response to include dcterms:publisher for the workspace itself (deployment name, contact email, logo URL), so that consumers can attribute and display the source.
  10. As a peer Ontos workspace, I want the catalog to include the ontosmkt: extension predicates defined in [PRD]: External Marketplace Providers #493 (offeringMode, audience, deliveryMethod, parentListing), so that [PRD]: External Marketplace Providers #493's importer round-trips losslessly.

Catalog content

  1. As a peer Ontos workspace, I want each externally-published Data Product serialised as dprod:DataProduct, dcat:DatasetSeries, so that import-side type assertions in [PRD]: External Marketplace Providers #493 fire correctly.
  2. As a peer Ontos workspace, I want each Output Port serialised as dprod:OutputPort, dcat:DataService with a stable dcat:endpointURL, so that the consumer side can present the port as a first-class addressable resource.
  3. As a peer Ontos workspace, I want the ODCS contract for each port attached as a dcat:Distribution with dcat:mediaType "application/vnd.odcs+json" and dcat:downloadURL pointing at the existing contract export endpoint, so that [PRD]: External Marketplace Providers #493's importer takes the rich path rather than the DCAT-stub fallback.
  4. As a peer Ontos workspace, I want the ODPS export for each product attached as a dcat:Distribution with dcat:mediaType "application/vnd.odps+json", so that contract-less products still round-trip.
  5. As a peer Ontos workspace, I want active Business Owners serialised as dcterms:publisher references to foaf:Agent resources with foaf:mbox and foaf:name, so that ownership survives import.
  6. As a peer Ontos workspace, I want Data Domains serialised as dcat:Catalog with dcat:theme mappings to the standard EU theme taxonomy (where mappable) and skos:Concept IRIs from the workspace's own taxonomies (where not), so that consumers can filter by theme.
  7. As a peer Ontos workspace, I want every cross-product lineage edge in EntityRelationshipDb of relevant relationship type serialised as prov:wasDerivedFrom, so that an importer can reconstruct the lineage graph.
  8. As a peer Ontos workspace, I want each item's dcterms:identifier to equal its Ontos UUID, so that import-time IRI rewriting is deterministic.
  9. As a peer Ontos workspace, I want each item's dcterms:modified to reflect the latest of (item.updated_at, ports.updated_at, contract.updated_at), so that ETag-based refresh detection on the consumer side works.
  10. As a peer Ontos workspace, I do not want internal-only fields (organization-scope items, draft items, retired items) to appear in the catalog under any circumstances, so that opt-in is the only path to external visibility.
  11. As an AI/MCP agent author, I want the catalog to be self-contained Turtle that resolves under standard prefixes (no Ontos-internal blank node syntax leaking out), so that an LLM with rdflib can answer questions about it directly.

Materialisation

  1. As an Ontos operator, I want the catalog materialised into the urn:ontos:self:catalog named graph in rdf_triples, so that SPARQL across our own catalog uses the same store as [PRD]: External Marketplace Providers #493's inbound catalogs.
  2. As an Ontos operator, I want the materialiser invoked transactionally from DataProductsManager, DataContractsManager, DataDomainsManager, and BusinessOwnersManager whenever an externally-scoped item changes, so that the catalog reflects reality on next fetch.
  3. As an Ontos operator, I want a low-frequency safety-net rebuild job (configurable, default every 60 minutes) that rebuilds the full catalog from scratch, so that drift from missed manager hooks is bounded.
  4. As an Ontos operator, I want a "Rebuild catalog now" admin button, so that I can force a rebuild after debugging.
  5. As an Ontos operator, I want the rebuild to be idempotent: if no externally-scoped item exists, the named graph is emptied and the endpoint returns a valid empty catalog, so that nothing is left over after the last opt-in is toggled off.
  6. As an Ontos operator, I want each rebuild to record start/end timestamps, item counts (products, contracts, domains, owners, distributions, triples), and any per-item failures into an admin-visible status row, so that I can diagnose problems.
  7. As an Ontos operator, I want a rebuild failure on a single item to skip that item and continue (with the failure recorded), so that one malformed product cannot break the whole catalog.

MCP / agent surface

  1. As an MCP-driven AI agent, I want a describe_product_as_dprod(product_id) tool that returns the product's serialised TTL fragment, so that I can pull a single product's metadata in linked-data form without hitting the full catalog.
  2. As an MCP-driven AI agent, I want a sparql_search_catalog(query) tool that runs SELECT and ASK queries against the urn:ontos:self:catalog named graph, so that I can reason over the workspace's products without scraping a TTL file.
  3. As an Ontos admin, I want SPARQL queries through the MCP tool to be capped (timeout, result size, complexity) and read-only, so that an unbounded agent query cannot DoS the database.
  4. As an Ontos admin, I want SPARQL queries through the MCP tool to be audience-filtered against the caller's MCP token principal, so that an agent never sees items the user behind the token could not see.
  5. As an Ontos admin, I want every MCP tool invocation related to the external catalog audit-logged with the query, principal, and result-size, so that I have full visibility into agent behaviour.
  6. As an MCP-driven AI agent, I do not want the MCP tools to ever mutate state, so that the catalog surface is strictly read-only.

Admin & observability

  1. As an Ontos admin, I want a new "External Catalog" panel under Settings, so that catalog configuration lives next to the existing marketplace and integration settings.
  2. As an Ontos admin, I want the panel to show: enabled toggle (workspace-level kill switch), workspace base IRI, publisher metadata (display name, contact email, logo URL), last rebuild status, current item counts, "rebuild now" button.
  3. As an Ontos admin, I want disabling the workspace-level kill switch to immediately return 404 on the catalog endpoint without erasing the materialised graph, so that re-enabling is fast.
  4. As an Ontos admin without the settings-external-catalog permission, I do not want to see the panel, so that producers do not accidentally land in admin territory.
  5. As a security admin, I want the /.well-known/dcat-catalog.ttl route to be excluded from any anonymous "well-known" allow-list (some deployments allow well-known prefixes anonymously), so that the catalog is not accidentally exposed.
  6. As an Ontos operator, I want a per-item dry-run preview ("what would this product look like in the external catalog?") on the Data Product detail page, so that producers can verify before flipping the opt-in switch.

Cross-feature integration

  1. As an Ontos operator running both [PRD]: External Marketplace Providers #493 (consumer) and this feature (publisher) in the same workspace, I want the two named-graph conventions (urn:provider:<id>:catalog for inbound, urn:ontos:self:catalog for outbound) to coexist cleanly, so that there is no name collision.
  2. As an Ontos operator running both features, I want a "Self-subscribe via [PRD]: External Marketplace Providers #493 for testing" admin affordance that creates a [PRD]: External Marketplace Providers #493 provider row pointing at our own catalog URL with a marker that hides it from the marketplace browse view, so that round-trip testing is one click.
  3. As a peer Ontos workspace, I want imports through [PRD]: External Marketplace Providers #493 of items we previously published to recognise the source and treat them as re-imports (matching by dcterms:identifier), so that round-tripping does not duplicate listings.

Implementation Decisions

Modules

The work splits into a small number of deep modules with simple interfaces and large internal payoffs.

  • OntologyRealignernot actually a module; it is a one-time hand edit to ontos-ontology.ttl. The file gains five new prefixes and roughly a dozen rdfs:subClassOf / owl:equivalentProperty assertions on existing classes. Version bumps to 2.1.0. No code change.
  • CatalogBuilder — pure function. Inputs: CatalogContext (workspace base IRI, calling principal, audience filter, publisher metadata) and a database session. Outputs: a ParsedCatalog dataclass containing the workspace dcat:Catalog and a list of ParsedListings for products, contracts, domains, and owners. The shape of ParsedCatalog deliberately mirrors the input shape [PRD]: External Marketplace Providers #493's DcatCatalogParser produces — same field names, same audience hint structure, same ontosmkt: extension carriers, same bundle-child parent linking — so that a fixture from one side is a fixture for the other.
  • DprodSerializer — pure function. Input: ParsedCatalog. Output: a Turtle string. Uses rdflib with the canonical DPROD / DCAT / PROV / dcterms: / ontosmkt: namespace bindings.
  • CatalogRebuilder — thin orchestrator. Calls CatalogBuilder with the workspace context, serialises via DprodSerializer, writes the resulting triples into urn:ontos:self:catalog in rdf_triples (replacing prior contents transactionally), and records a status row. Invoked from manager hooks and the background job.
  • ExternalCatalogRoute — single FastAPI route at GET /.well-known/dcat-catalog.ttl. Reads from the materialised named graph, applies the caller's audience filter, serialises the filtered subgraph back to TTL via rdflib, returns with caching headers.
  • DprodMcpTools — two new tool classes (DescribeProductAsDprod, SparqlSearchCatalog) under src/backend/src/tools/, following the existing BaseTool / ToolContext pattern.
  • ExternalCatalogSettingsPanel — admin UI panel under Settings. Reuses existing Settings panel patterns (cf. settings-directory.tsx).

Reuse from existing code

  • rdf_triples_repository.py with context_name-based bulk insert, remove-by-context, and SPARQL-ready storage covers the materialisation substrate with no new tables.
  • publication_scope enum on data_products and data_contracts is extended with the new external value via a tiny Alembic migration.
  • build_odps_export on DataProductsManager is the precedent for CatalogBuilder's sub-step that produces the attached ODPS distribution — the existing function is called from the builder rather than reimplemented.
  • Equivalent existing exporter on DataContractsManager provides the attached ODCS distribution.
  • Audience filtering reuses the comment-audience evaluator and the same token grammar (Entra group, team, role, data domain) already used elsewhere in the platform.
  • BusinessOwnersManager already exposes active owners per entity; the builder consumes that directly to populate dcterms:publisher.
  • EntityRelationshipDb is the source of prov:wasDerivedFrom edges.
  • BaseTool / ToolContext in src/backend/src/tools/ are the surface for the two new MCP tools; no MCP-server-level changes.
  • Manager-hook pattern (existing notify_* callbacks on managers) carries the rebuild trigger; no new event bus required.

Schema changes

  • Alembic migration adding 'external' to the publication_scope Postgres enum on data_products and data_contracts. Idempotent and reversible.
  • New external_catalog_status table (single-row by convention): last rebuild start/end/status, item counts, last error. Tiny.
  • No new tables for the catalog itself — rdf_triples carries everything via the urn:ontos:self:catalog context name.

API contracts

  • GET /.well-known/dcat-catalog.ttl — authenticated, returns text/turtle. Honours If-None-Match / If-Modified-Since. Returns 404 when the workspace-level kill switch is off.
  • POST /api/external-catalog/rebuild — admin-only, triggers immediate rebuild. Returns rebuild status.
  • GET /api/external-catalog/status — admin-only, returns last rebuild status payload.
  • GET /api/external-catalog/preview/{product_id} — producer-accessible, returns the TTL fragment that would be published for one product (dry-run).
  • PATCH /api/data-products/{id} accepts publication_scope=external (additive to the existing enum); same on /api/data-contracts/{id}.
  • MCP tools describe_product_as_dprod and sparql_search_catalog follow the existing tool registry pattern in src/backend/src/tools/registry.py.

Wire format conventions (symmetric with #493)

  • Workspace catalog IRI: <workspace_base>/.well-known/dcat-catalog.ttl#catalog.
  • Item IRIs: <workspace_base>/api/data-products/<uuid>, <workspace_base>/api/data-contracts/<uuid>, etc. — using the same canonical URLs the UI already routes.
  • Named graph: urn:ontos:self:catalog.
  • Extension vocabulary: @prefix ontosmkt: <https://ontos.app/ext/marketplace#> . — the same prefix defined by [PRD]: External Marketplace Providers #493. Predicates reused without redefinition: ontosmkt:offeringMode, ontosmkt:audience, ontosmkt:deliveryMethod, ontosmkt:parentListing.
  • Attached distribution media types: application/vnd.odps+json for ODPS exports, application/vnd.odcs+json for ODCS exports — exactly the strings [PRD]: External Marketplace Providers #493 prefers on import.

Rebuild triggers

  • DataProductsManager.update / delete / status change: trigger if before-or-after publication_scope=external.
  • DataContractsManager.update / delete / status change: same.
  • DataDomainsManager.update: trigger if the domain has any externally-scoped child.
  • BusinessOwnersManager mutations: trigger if the entity has publication_scope=external.
  • EntityRelationshipsManager lineage edge mutations: trigger if either endpoint has publication_scope=external.
  • Background safety-net job: configurable cadence, default 60 min.

Phasing

  • P1 — Ontology + Builder + Serializer. Edits ontos-ontology.ttl. Adds CatalogBuilder and DprodSerializer with full unit-test coverage and the round-trip integration test against [PRD]: External Marketplace Providers #493's DcatCatalogParser. Adds the publication_scope=external enum value with the Alembic migration but no UI affordance yet. Self-contained, no externally-visible surface.
  • P2 — Endpoint, opt-in UX, MCP, materialisation. Adds CatalogRebuilder, the route, the manager hooks, the background job, the per-item toggle, the admin panel, the two MCP tools, and the audit-log integration.

Testing Decisions

A test is good when it asserts the external behaviour of a module — what callers observe — and is silent about which library produced the TTL, what intermediate dataclass shape was used, or which SPARQL plan the triple store chose. Tests that fail when a helper is renamed but no observable behaviour changes are anti-tests. Conversely, tests that fail when a published TTL document loses a field a peer Ontos importer needs are exactly the tests we want to be loud.

Unit-tested modules

  • CatalogBuilder — fixture-driven. Inputs: an in-memory product/contract/domain/owner graph plus a CatalogContext. Outputs: a ParsedCatalog. Tests assert: required DCAT-AP fields present, attached ODPS/ODCS distributions populated, ontosmkt: audience hints carried through, dcterms:publisher derived from active owners (and excludes inactive ones), lineage edges only included when relevant, items without publication_scope=external excluded under all configurations, audience filter applied. Prior art: existing tests around build_odps_export and the comment-audience evaluator.
  • DprodSerializer — golden-file tests. Inputs: a curated set of ParsedCatalog fixtures (single product, single contract, product-with-multiple-ports, lineage-graph, audience-restricted item, edge-case Unicode in names, deeply-nested domains). Outputs: canonical TTL. The golden files are checked in; the test parses both expected and actual via rdflib and asserts graph isomorphism, so that formatting differences do not break the suite. Prior art: existing ontology-loader tests.
  • Audience filter applied at builder level — pure function table tests. Input: token-list + principal context. Assertions: matches Entra group, team, role, domain tokens correctly; absent tokens means visible; logical OR semantics across tokens. Prior art: existing comment-audience tests.
  • CatalogRebuilder.rebuild — using a fake builder + fake serializer + real rdf_triples repository against a test DB. Assertions: successful rebuild replaces the urn:ontos:self:catalog graph transactionally; failures persist the error without clearing the cache; partial-failure mode skips bad items and records them; idempotent re-rebuild yields the same triples.
  • MCP tool tests — using a real materialised catalog in a test DB. Assertions: describe_product_as_dprod returns a parseable TTL fragment whose subject IRI matches the product; sparql_search_catalog applies the audience filter (a query that would return restricted items returns the unrestricted subset for a non-privileged caller); query timeouts and result-size caps trip and return a clean error.

Integration-tested flows

  • Round-trip with [PRD]: External Marketplace Providers #493. The single most important test: emit a curated multi-product, multi-port, multi-domain test fixture through DprodSerializer, feed the resulting TTL into a fresh in-memory copy of [PRD]: External Marketplace Providers #493's DcatCatalogParser, assert that every field in the original ParsedCatalog is present in the parsed ParsedCatalog. This test forces the two PRDs to share fixtures and pins the wire format. Sharing the fixture directory between the two test suites is intentional; either side breaking the contract fails this test.
  • Endpoint end-to-end. Hit GET /.well-known/dcat-catalog.ttl with an authenticated user; parse the response with rdflib; assert expected triples for a product the user can see and absence of triples for a product the user cannot see (audience filter).
  • Endpoint anonymous rejection. Hit the same endpoint without credentials; assert 401.
  • Endpoint kill-switch. Disable the workspace-level kill switch; hit the endpoint; assert 404. Re-enable; assert 200 with the same content as before.
  • Manager hook triggers rebuild. Update a product's publication_scope to external; assert the materialised graph contains the product on next fetch. Toggle back to marketplace; assert the product disappears.
  • Audit log. Toggle publication_scope=external on a product; assert an audit log row with actor, before, after.
  • Ontology validation. Load ontos-ontology.ttl after the realignment; assert it parses as valid Turtle, every existing class is still present, and the new subClassOf assertions resolve.

Out of testing scope (v1)

  • We do not test that rdflib's TTL serializer is byte-stable across versions — that is the library's job; we use graph-isomorphism comparisons.
  • We do not test SPARQL engine performance under contrived workloads; query caps are enforced at the tool layer, not the engine.
  • We do not test cross-workspace IRI uniqueness — that is a deployment concern, addressed by the configurable workspace base IRI.
  • We do not test SHACL conformance — SHACL shapes are deferred to a follow-up.

Out of Scope

  • JSON-LD output. v1 is TTL only, matching [PRD]: External Marketplace Providers #493's wire format. JSON-LD content negotiation, a stable JSON-LD context, and a @context-versioning policy are deferred to v2.
  • SHACL shapes. A formal SHACL constraint set for the published catalog (and validation tests against it) is deferred. v1 relies on the round-trip test as the contract.
  • ODRL policy modelling. Compliance policies, access policies, and licence terms are not rendered as odrl:Policy in v1; they remain Ontos-internal. Deferred to v2.
  • DQV quality measurements. Quality items from quality_items are not rendered as dqv:QualityMeasurement in v1. The internal data is rich enough; serialising is just deferred.
  • Anonymous public catalog access. v1 is authenticated only. An anonymous read-only mode (with reduced metadata for un-audienced items) is a v2 affordance.
  • Per-domain sub-catalogs. v1 publishes a single workspace-wide catalog. Per-domain dcat:Catalog partitioning is supported by the ontology (ontos:DataDomain rdfs:subClassOf dcat:Catalog) but not exposed as separate URLs.
  • Active federation handshake. v1 is passive: Ontos publishes, peer Ontos workspaces subscribe via [PRD]: External Marketplace Providers #493. A mutual peering / capability-negotiation protocol is out of scope.
  • A /.well-known/ontos discovery doc. Listing the workspace's supported standards, contact, capabilities, and Ontos version is a nice-to-have for v2.
  • External-catalog publication of Asset-backed entities (Datasets, Dashboards, ML Models). v1 publishes Tier 1 dedicated entities only (Data Products, Data Contracts, Data Domains, Owners). Asset-backed entities are referenced indirectly through the product/contract graph.
  • Cross-workspace identity reconciliation. When two workspaces publish a product with the same dcterms:identifier, deduplication is the consumer's problem (and [PRD]: External Marketplace Providers #493's). v1 does not attempt cross-workspace identity coordination.
  • Catalog-level signing or cryptographic provenance. No JWS-signed catalog, no HMAC. Authentication is the SSO layer.
  • OAuth 2.0 / DPoP for catalog consumers. Authentication uses the existing Ontos SSO mechanism; an OAuth-protected machine-to-machine catalog consumer flow is deferred.
  • Multi-language metadata. v1 publishes labels and descriptions in the language they are stored as in Ontos (typically English). @en tags are emitted, but no parallel @de/@fr/etc. fields are populated. i18n of catalog metadata is a v2 follow-up.

Further Notes

  • Why DPROD-on-top-of-DCAT-AP, not pure DPROD. DPROD is a profile of DCAT, not a replacement. The wider linked-data ecosystem (EU Data Spaces, Gaia-X, national open-data portals) consumes DCAT-AP. By emitting DCAT-AP with DPROD-specific assertions layered in, we maximise the set of consumers that can use the output. A pure DPROD endpoint would be a strictly smaller target.
  • [PRD]: External Marketplace Providers #493 is the keystone. This PRD only makes sense because [PRD]: External Marketplace Providers #493 exists. Without an inbound DCAT-AP consumer in Ontos, publishing a DCAT-AP catalog out of Ontos is half a feature. With [PRD]: External Marketplace Providers #493, publishing closes the loop: two Ontos workspaces federate using nothing but the standards both ends already speak.
  • The light ontology realignment is the highest-leverage smallest change. A dozen subClassOf assertions in ontos-ontology.ttl turn the entire workspace's RDF surface area from "vendor-specific vocabulary" into "DPROD/DCAT-conformant vocabulary" at literally zero runtime cost.
  • Symmetric vocabulary is a deliberate design choice. Using ontosmkt: and the attached ODPS/ODCS distribution media types from [PRD]: External Marketplace Providers #493 means the parser on the inbound side is the same parser a peer Ontos would use to read our catalog. This is the round-trip lossless property. Any future extension predicate added to [PRD]: External Marketplace Providers #493 (e.g. for new delivery methods) becomes immediately usable on the publishing side.
  • Why no SHACL in v1. SHACL is a downstream consumer's tool for asserting structural conformance. Until we have at least one consumer outside [PRD]: External Marketplace Providers #493 that wants to validate against shapes, shipping shapes is speculative. The round-trip test against [PRD]: External Marketplace Providers #493 acts as a functional shape contract for v1.
  • The MCP SPARQL tool is intentionally read-only. Once SPARQL hits the catalog, the natural temptation is to expose write paths too. We do not, because (a) mutating state via SPARQL is hard to audit, and (b) the audit log invariant is much simpler if every external-catalog write goes through the existing manager APIs.
  • A round-trip self-subscribe button is the cheapest end-to-end test. A one-click admin affordance that creates an [PRD]: External Marketplace Providers #493 provider row pointing at our own catalog URL gives us a continuous integration of both PRDs in production — every materialise on the publisher side immediately replays through the consumer side.
  • Standards alignment with EU Data Spaces / Gaia-X is a strategic bet. As with [PRD]: External Marketplace Providers #493, picking DCAT-AP positions Ontos to be a native participant in those federations as they mature. No customer is required to engage with them, but the option is preserved.
  • The Marketplace Origin polymorphic panel from [PRD]: External Marketplace Providers #493 is reusable for entities imported back into Ontos via the round-trip. If a peer Ontos imports our catalog, the materialised products on the consumer side already get the panel for free. No additional work.
  • What the v2 follow-up looks like. ODRL policy rendering, DQV quality metrics, SHACL shapes shipped as a separate ontos-shapes.ttl, JSON-LD content negotiation with a stable versioned context, per-domain sub-catalogs, anonymous public catalog access with reduced metadata, the /.well-known/ontos discovery document, multi-language metadata. None of these block v1; each is independently shippable on top of v1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmap/futureFuture itemscope/contractsData Contract related featurescope/ontologyOntology related featurescope/productsData Produc related featurescope/settingsSettings related featuretype/featureFeature requeststype/prdPRD: large feature with a written product requirements document

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions