feat(rdf): SPARQL playground, MCP knowledge-graph tools, insights endpoints by harshach · Pull Request #28042 · open-metadata/OpenMetadata

harshach · 2026-05-11T18:29:03Z

Describe your changes

This PR closes the RDF fidelity gaps in the knowledge-graph stack and removes the experimental R2RML row-materialization feature (the schema-level concept-graph is the strategy going forward; row triples don't scale and were the wrong target). It adds: a SPARQL playground UI, five MCP knowledge-graph tools (SparqlQueryTool, EntityNeighborhoodTool, FindByTagTool, OntologyDescribeTool, ShaclValidateTool), insights endpoints (PageRank-based importance, Louvain communities, shortest-path explain-lineage, tag/glossary co-occurrence, recommendations), an inference-rule registry with a starter pack (transitive lineage, schema-tag/domain inheritance, PII-propagation-via-lineage), federated-SPARQL allowlist, expanded SHACL shapes + REST validator, custom-ontology upload/extension, PROV-O activity mapper, DQV quality mapper, usage mapper, and JSON-LD context coverage for AI/Automation/Governance entities. Deletes the R2RML mapping schema, validator, registry, applier, materializer workflow, OntologyEmitter sink, Linked-Data UI mode, and emitOntologyTriples profiler flag. Adds the OpenMetadata ontology TTL coverage (skos, dcat, prov, dprod, foaf) for the new entity types and a CHANGELOG.

Type of change

Improvement
New feature
Documentation

Tests

Unit + ITs added for every new component: SparqlQueryToolTest, EntityNeighborhoodToolTest, FindByTagToolTest, OntologyDescribeToolTest, ShaclValidateToolTest, CustomOntologyValidatorTest, SparqlFederationGuardTest, InferenceRuleValidatorTest, CentralityComputationTest, CommunityComputationTest, LouvainTest, PageRankTest, LineagePathBuilderTest, LineagePathFinderTest, ImportanceQueryBuilderTest, CoOccurrenceQueryBuilderTest, RecommendationsQueryBuilderTest, RdfShaclValidatorTest, OntologyDocumentTest, RdfUsageMapperTest, RdfJsonLdContextTest, expanded RdfPropertyMapperTest, and RdfResourceIT. SPARQL playground covered by SparqlPlayground.test.tsx.

UI screen recording / screenshots

SPARQL playground + KnowledgeGraph view: screenshot/recording to be attached on PR.

🤖 Generated with Claude Code

Summary by Gitar

Configuration updates:
- Renamed remoteEndpoint to RDF_REMOTE_ENDPOINT in openmetadata.yaml for RDF configuration.
Migration handling:
- Updated MigrationWorkflow.java to surface and list specific pending migration versions in error messages.
UI localization:
- Added panel-background-color key across all supported language files in openmetadata-ui.
Generated Types:
- Added rdfConfiguration.ts to define schema types for RDF-related configuration, supporting the new knowledge-graph infrastructure.

_{This will update automatically on new commits.}

Greptile Summary

This PR significantly expands the RDF/knowledge-graph stack by adding a SPARQL playground UI, five MCP knowledge-graph tools, a suite of graph-analytics insights endpoints (PageRank centrality, Louvain communities, BFS lineage paths, tag co-occurrence, recommendations), an inference-rule registry, federated-SPARQL allowlist enforcement, expanded SHACL validation, custom-ontology upload, and broader JSON-LD context coverage — while removing the experimental R2RML row-materialization path.

Security hardenings are solid: SPARQL UPDATE validation now uses UpdateFactory (Jena parser) instead of fragile keyword matching; the federation guard uses ElementWalker to catch SERVICE clauses structurally; RdfIriValidator centralises IRI sanitisation for SPARQL interpolation; all new endpoints are authorizeAdmin-gated.
New MCP tools (SparqlQueryTool, EntityNeighborhoodTool, FindByTagTool, OntologyDescribeTool, ShaclValidateTool) are read-only, enforce the federation allowlist, and cap response sizes.
CustomOntologyValidator.hasCycle has a correctness bug: onStack is not reset between outer-loop DFS calls, so nodes that merely reference nodes in a confirmed cycle are falsely reported as cyclic themselves, causing valid ontology extensions to be rejected at the admin validation endpoint.

Confidence Score: 3/5

The SPARQL injection hardenings and federation allowlist are well-implemented, but the cycle-detection bug in CustomOntologyValidator will falsely reject valid ontology extensions once any extension with a true cycle has been validated in the same request.

The bulk of the change — federation guard, IRI validator, read-only MCP tools, insights endpoints, SHACL validation — is carefully implemented and admin-gated. The one correctness defect in hasCycle produces incorrect validation results for a real admin workflow (custom ontology registration) with no easy workaround; it needs a one-line fix before this path is relied upon.

openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java — the detectClassHierarchyCycles / hasCycle pair needs onStack.clear() between outer iterations.

Important Files Changed

Filename	Overview
openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java	New admin-only validator for custom ontology extensions; contains a P1 bug in hasCycle where onStack is not reset between outer DFS iterations, producing false-positive cycle errors for nodes that merely reference cyclic nodes without being part of a cycle themselves.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/federation/SparqlFederationGuard.java	New SPARQL SERVICE-clause allowlist guard; correctly uses Jena's ElementWalker to parse real SERVICE elements (not text matching), handles subqueries and variable endpoints, and is applied at both the REST and MCP tool SPARQL paths.
openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java	Major expansion adding SHACL validation, inference-rule CRUD, insights endpoints (importance, centrality, communities, lineage path, recommendations, co-occurrence), and ontology extension management — all guarded by authorizeAdmin. The private executeSparqlQuery helper correctly enforces the federation guard on every query path including internally-built insight queries.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SparqlQueryTool.java	New MCP SPARQL read-only tool; correctly rejects UPDATE/INSERT/DELETE via Jena parser (not text matching), enforces the federation allowlist, caps response size at configurable maxBytes with safe UTF-8 split logic, and restricts accepted formats to a known set.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/FindByTagTool.java	New MCP tag-search tool; SPARQL is built by string interpolation but the FQN is validated to block quotes, backslash, and newlines before insertion into a double-quoted SPARQL string literal, preventing injection.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/LineagePathFinder.java	BFS lineage path-finder; uses per-frontier SPARQL round-trips with a visited set to prevent cycles and a parents map for path reconstruction. Inputs are validated by LineagePathBuilder.validateNodeUri.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/PageRank.java	Clean iterative weighted PageRank with dangling-node mass redistribution, convergence tolerance, and post-normalization. Handles disconnected components and isolated sinks correctly.
openmetadata-ui/src/main/resources/ui/src/pages/SparqlPlayground/SparqlPlayground.component.tsx	New admin SPARQL playground UI; uses window.prompt() for saved-query naming (P2 UX issue). Format options, inference selection, sample queries, and result rendering are well structured.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfIriValidator.java	New shared IRI validator for SPARQL interpolation; blocks angle-brackets, quotes, backticks, and control characters, enforces http/https scheme, and caps length at 2048. Not yet reused by LineagePathBuilder.validateNodeUri.
openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java	Minor improvement: pending migration versions are now surfaced in the error message, making startup failures easier to diagnose.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant UI as SparqlPlayground UI
    participant REST as RdfResource (JAX-RS)
    participant Guard as SparqlFederationGuard
    participant Repo as RdfRepository
    participant MCP as MCP SparqlQueryTool

    UI->>REST: "POST /v1/rdf/sparql {query, format, inference}"
    REST->>Guard: enforce(query)
    alt SERVICE clause blocked
        Guard-->>REST: FederationDisallowedException
        REST-->>UI: 403 Forbidden
    else allowed
        Guard-->>REST: ok
        REST->>Repo: executeSparqlQuery(query, mimeType)
        Repo-->>REST: results
        REST-->>UI: 200 + result body
    end

    MCP->>MCP: SparqlQueryTool.execute(params)
    MCP->>MCP: QueryFactory.create(query) — reject UPDATE
    MCP->>Guard: new SparqlFederationGuard(config).enforce(query)
    alt SERVICE blocked
        Guard-->>MCP: FederationDisallowedException
        MCP-->>MCP: return error map
    else allowed
        MCP->>Repo: executeSparqlQuery(query, mimeType)
        Repo-->>MCP: results (capped at maxBytes)
        MCP-->>MCP: return result map
    end

    REST->>REST: "GET /v1/rdf/insights/path?from=..&to=.."
    REST->>REST: LineagePathBuilder.validateNodeUri(from, to)
    REST->>Repo: BFS frontier SPARQL queries (per hop)
    Repo-->>REST: neighbor rows
    REST-->>REST: reconstruct path + decorate with rdf:type
    REST-->>UI: 200 Path JSON

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant UI as SparqlPlayground UI
    participant REST as RdfResource (JAX-RS)
    participant Guard as SparqlFederationGuard
    participant Repo as RdfRepository
    participant MCP as MCP SparqlQueryTool

    UI->>REST: "POST /v1/rdf/sparql {query, format, inference}"
    REST->>Guard: enforce(query)
    alt SERVICE clause blocked
        Guard-->>REST: FederationDisallowedException
        REST-->>UI: 403 Forbidden
    else allowed
        Guard-->>REST: ok
        REST->>Repo: executeSparqlQuery(query, mimeType)
        Repo-->>REST: results
        REST-->>UI: 200 + result body
    end

    MCP->>MCP: SparqlQueryTool.execute(params)
    MCP->>MCP: QueryFactory.create(query) — reject UPDATE
    MCP->>Guard: new SparqlFederationGuard(config).enforce(query)
    alt SERVICE blocked
        Guard-->>MCP: FederationDisallowedException
        MCP-->>MCP: return error map
    else allowed
        MCP->>Repo: executeSparqlQuery(query, mimeType)
        Repo-->>MCP: results (capped at maxBytes)
        MCP-->>MCP: return result map
    end

    REST->>REST: "GET /v1/rdf/insights/path?from=..&to=.."
    REST->>REST: LineagePathBuilder.validateNodeUri(from, to)
    REST->>Repo: BFS frontier SPARQL queries (per hop)
    Repo-->>REST: neighbor rows
    REST-->>REST: reconstruct path + decorate with rdf:type
    REST-->>UI: 200 Path JSON

_{Reviews (1): Last reviewed commit: "Fix Java checkstyle" | Re-trigger Greptile}

Greptile also left 4 inline comments on this PR.

github-actions · 2026-05-11T18:33:45Z

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

github-actions · 2026-05-11T18:35:08Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copilot

Pull request overview

This PR expands OpenMetadata’s RDF/knowledge-graph capabilities end-to-end: UI support for running SPARQL, spec-level ontology/context updates, and service/MCP tooling for querying, validation (SHACL), federation-guarding, inference rules, and “insights” SPARQL query builders.

Changes:

Adds SPARQL playground client support + routing, plus supporting UI enums/interfaces and locale keys.
Introduces/extends RDF specs (ontology changelog, JSON-LD contexts, RDF configuration schema) and adds inference-rule + federation configuration shapes.
Adds service-side RDF utilities (ontology document serving, SHACL validator, usage/quality/activity mappers, insights query builders) and MCP tools that expose read-only SPARQL + graph utilities.

Reviewed changes

Copilot reviewed 97 out of 100 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
openmetadata-ui/src/main/resources/ui/src/rest/rdfAPI.ts	Adds SPARQL playground REST helper (`runSparqlQuery`) and result typing.
openmetadata-ui/src/main/resources/ui/src/pages/SparqlPlayground/SparqlPlayground.interface.ts	Adds local types/constants for saved queries and sample queries.
openmetadata-ui/src/main/resources/ui/src/locale/languages/en-us.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-tw.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-cn.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/tr-tr.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/th-th.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ru-ru.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-pt.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-br.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pr-pr.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/nl-nl.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/mr-in.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ko-kr.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ja-jp.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/he-he.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/gl-es.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/fr-fr.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/es-es.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/de-de.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ar-sa.json	Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/enums/codemirror.enum.ts	Adds CodeMirror mode value for SPARQL.
openmetadata-ui/src/main/resources/ui/src/constants/constants.ts	Adds route constant for SPARQL playground.
openmetadata-ui/src/main/resources/ui/src/components/KnowledgeGraph/KnowledgeGraph.tsx	Refactors metadata-mode body rendering; removes a header label.
openmetadata-ui/src/main/resources/ui/src/components/AppRouter/AuthenticatedAppRouter.tsx	Registers lazy-loaded SPARQL playground route.
openmetadata-spec/src/main/resources/rdf/ontology/openmetadata-prov.ttl	Adds missing RDF prefix in PROV extension TTL.
openmetadata-spec/src/main/resources/rdf/ontology/CHANGELOG.md	Introduces ontology changelog documenting fidelity changes.
openmetadata-spec/src/main/resources/rdf/contexts/lineage.jsonld	Adjusts lineage JSON-LD mappings to avoid predicate collisions.
openmetadata-spec/src/main/resources/rdf/contexts/governance.jsonld	Aligns governance mappings with SKOS predicates and adds new fields.
openmetadata-spec/src/main/resources/rdf/contexts/automation.jsonld	Adds JSON-LD context for automation/workflow entities.
openmetadata-spec/src/main/resources/rdf/contexts/ai.jsonld	Adds JSON-LD context for AI/MCP-related entities.
openmetadata-spec/src/main/resources/json/schema/api/configuration/rdfConfiguration.json	Adds federation config block to RDF configuration schema.
openmetadata-spec/src/main/resources/json/schema/api/configuration/rdf/inferenceRule.json	Adds schema for inference rule objects (CONSTRUCT/RDFS placeholder).
openmetadata-spec/src/main/resources/json/schema/api/configuration/rdf/customOntology.json	Adds schema for custom ontology extension definitions.
openmetadata-service/src/test/java/org/openmetadata/service/resources/rdf/RdfShaclValidatorTest.java	Adds SHACL validation tests for key shape constraints.
openmetadata-service/src/test/java/org/openmetadata/service/resources/rdf/OntologyDocumentTest.java	Adds tests for ontology serving endpoint serialization.
openmetadata-service/src/test/java/org/openmetadata/service/rdf/translator/RdfUsageMapperTest.java	Adds tests for RDF usage summary mapping.
openmetadata-service/src/test/java/org/openmetadata/service/rdf/insights/RecommendationsQueryBuilderTest.java	Adds tests for recommendations SPARQL builder validation/shape.
openmetadata-service/src/test/java/org/openmetadata/service/rdf/insights/CoOccurrenceQueryBuilderTest.java	Adds tests for co-occurrence/popularity/reach SPARQL builders.
openmetadata-service/src/main/resources/rdf/inference-rules/transitive-lineage-closure.json	Adds starter inference rule JSON.
openmetadata-service/src/main/resources/rdf/inference-rules/schema-tag-inheritance.json	Adds starter inference rule JSON.
openmetadata-service/src/main/resources/rdf/inference-rules/pii-propagation-via-lineage.json	Adds starter inference rule JSON.
openmetadata-service/src/main/resources/rdf/inference-rules/domain-membership-inheritance.json	Adds starter inference rule JSON.
openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfShaclValidator.java	Adds SHACL shapes loader + validator helper.
openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/OntologyDocument.java	Adds merged ontology document loader + serializer for multiple formats.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfUsageMapper.java	Adds mapper emitting usage-summary triples.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfQualityMapper.java	Adds mapper emitting DQV quality measurement resources.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfActivityMapper.java	Adds PROV activity mapping for pipeline runs.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/JsonLdTranslator.java	Loads new JSON-LD contexts and assigns column IDs for named column resources.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfUtils.java	Adds RDF types for new entities and a helper to mint column URIs.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/RecommendationsQueryBuilder.java	Adds recommendations SPARQL builder.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/ImportanceQueryBuilder.java	Adds importance ranking SPARQL builder.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CoOccurrenceQueryBuilder.java	Adds co-occurrence/popularity/reach SPARQL builders.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/inference/InferenceRuleValidator.java	Adds strict validation for inference rule bodies.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/inference/InferenceRuleRegistry.java	Adds in-memory inference rule registry + starter pack loader.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/federation/SparqlFederationGuard.java	Adds SERVICE allowlist enforcement for federated SPARQL.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyRegistry.java	Adds in-memory registry for custom ontology extensions.
openmetadata-mcp/src/test/java/org/openmetadata/mcp/tools/OntologyDescribeToolTest.java	Adds tests for ontology describe MCP tool.
openmetadata-mcp/src/test/java/org/openmetadata/mcp/tools/FindByTagToolTest.java	Adds tests for find-by-tag MCP tool.
openmetadata-mcp/src/test/java/org/openmetadata/mcp/tools/EntityNeighborhoodToolTest.java	Adds tests for entity neighborhood MCP tool.
openmetadata-mcp/src/main/resources/json/data/mcp/tools.json	Registers new MCP tool schemas for KG/SPARQL features.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SparqlQueryTool.java	Adds read-only SPARQL MCP tool with federation guard + truncation.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/ShaclValidateTool.java	Adds SHACL validation MCP tool (entity-scoped or full graph).
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/OntologyDescribeTool.java	Adds ontology describe MCP tool (full ontology or DESCRIBE).
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/FindByTagTool.java	Adds MCP tool to find entities by tag/glossary FQN.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/DefaultToolContext.java	Wires new MCP tools into tool dispatcher.
ingestion/src/metadata/workflow/profiler.py	Adds optional ontology emission step to profiler workflow.
docker/development/docker-compose-postgres-fuseki.yml	Adds RDF endpoint env var for Fuseki-based dev stack.

Comments suppressed due to low confidence (1)

openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfUtils.java:20

New RDF types were added for workflowinstance, agentexecution, mcpexecution, automation, etc., but PROV_ACTIVITY_TYPES was not updated. Since JsonLdTranslator.toRdf() relies on getProvType() to add prov:Activity typing (when the primary rdf:type is not already a PROV class), these new execution-like entities will miss rdf:type prov:Activity, reducing PROV-O compatibility and potentially breaking queries that filter by prov:Activity. Please extend the PROV type sets to cover the new entity types.

  private static final Set<String> PROV_ACTIVITY_TYPES =
      Set.of(
          "pipeline",
          "ingestionpipeline",
          "storedprocedure",
          "dbtpipeline",
          "workflow",
          "pipelinerun");

github-actions · 2026-05-11T18:49:15Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

…ntic correctness Addresses the open review comments on PR #28042. Each fix is independent. P0 — correctness / security - ingestion/profiler.py: drop residual OntologyEmitter import (module removed in the R2RML pivot). Every pytest that imports profiler.py was failing with ModuleNotFoundError; reverts file to match main. - RdfResource.validateGraph + ShaclValidateTool: replace `entityUri.replace(">", "")` with strict absolute-http(s)-IRI validation. Reject control chars, whitespace, quotes, and angle brackets up front. Closes the SPARQL-injection vector via newlines / # comments in the DESCRIBE template. - EntityNeighborhoodTool.buildConstructQuery: rewrite to emit each path edge in its own UNION arm with the correct subject. The previous BIND(<entity> AS ?s) applied across all arms collapsed every multi-hop edge onto the start node. P1 — semantic / robustness - CommunityComputation.parseGraph: stop emitting both directions; the FILTER in the SPARQL canonicalises pairs and Louvain.addAllEdges symmetrises the adjacency internally. Double symmetrisation was doubling every edge weight. - RdfActivityMapper: emit `prov:wasInformedBy` (Activity → Activity) instead of `prov:wasGeneratedBy` (Entity → Activity) for pipeline-run → pipeline. Closes the PROV-O domain/range inversion. - RdfQualityMapper.measurementUri: deterministic URI built from subject + metric + timestamp. Random UUID was leaking orphan QualityMeasurement nodes on every re-emit because deletion only follows subject/object on the entity itself. - SparqlQueryTool: byte-aware truncation. Previous substring-by-char cap could exceed maxBytes for multi-byte UTF-8 and never enforced a real byte limit. - ShaclValidateTool: require explicit `fullGraph=true` to validate the entire triplestore; reject otherwise. Prevents accidental OOM on multi-GB graphs. - RdfResource.getFederationGuard: synchronise the lazy-init path; the volatile field was double-checked without a lock. P2 — code quality - RdfResource.validateGraph: replace 14 inline FQN class names with proper imports (Model / ModelFactory / Lang / RDFDataMgr / RDFFormat / ValidationReport). - RdfShaclValidator + OntologyDocument: catch RuntimeException (covers RiotException) in the static-init shapes/ontology load. A corrupt classpath resource now degrades to an empty model instead of failing class init and taking down /rdf/* and MCP describe. - i18n: restore `label.view-mode` (still referenced by OntologyExplorer/FilterToolbar.tsx) and route the SparqlPlayground sample-query names through translation keys (`label.sparql-sample-*`). All 17 locales synced via `yarn i18n`. Tests updated where the fix changes observable behaviour: - EntityNeighborhoodToolTest: assert the new multi-hop subject preservation and the new depth-3 chain variable name. - CommunityComputationTest: assert the directed-single-edge behaviour (Louvain symmetrises internally). - RdfPropertyMapperTest: assert `prov:wasInformedBy` instead of `prov:wasGeneratedBy` for the pipeline activity. - SparqlPlayground.test.tsx: drive the sample button by `nameKey`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 95 out of 98 changed files in this pull request and generated 4 comments.

#28289) The ContextMemory schema (#28224) ships MemoryVisibility (PRIVATE/SHARED/ ENTITY) and shareConfig.sharedWith, but the CRUD resource didn't enforce them — any non-admin user could read another user's PRIVATE memory via GET /v1/contextCenter/memories/{id} or pick them out of /memories list results. Wire visibility into list/get/getByName via a small ContextMemoryVisibility helper: - list: drop non-visible memories from the response data - get / getByName: 403 (ForbiddenException) when not visible to the caller - Visibility rule: admin → all, owner → own memory, ENTITY → everyone, SHARED → only principals listed in sharedWith (with team/domain expansion), PRIVATE / null shareConfig → owner only

…to databases (#28288) * fix(ui): return 404 for unknown service category instead of fallback to databases * fix ui checkstyle

…7 with retry + add workflow_dispatch (#28292) * ci(airflow-apis-tests): retry Sonar PR scan on JRE-provisioning flake Mirror the py-tests pattern: migrate from the deprecated sonarsource/sonarcloud-github-action@master to SonarSource/sonarqube-scan-action@v7, mark the PR scan continue-on-error, and add a sleep+retry step so a transient 'Failed to query JRE metadata' from Sonar's JRE-provisioning endpoint no longer fails the job on first attempt. Hoist the shared sonar args into a workflow-level SONAR_OPTS env. * ci(airflow-apis-tests): allow workflow_dispatch + run Sonar step on it Add workflow_dispatch trigger so the Sonar retry path can be exercised from the Actions UI without opening a PR, and extend the Sonar PR step (plus its wait+retry siblings) to run on the dispatch event. * ci(airflow-apis-tests): scope Sonar steps to pull_request_target only Drop workflow_dispatch from the Sonar PR/retry step conditions so manual runs don't fire the scanner with empty -Dsonar.pullrequest.* flags (would create a branch entry in SonarCloud, per gitar-bot review). Dispatch trigger stays for re-running the build/test surface; Sonar will only fire on a real PR where the pull-request context exists.

…28273) * fix(logging): synchronous shutdown captures full streamable log tail V1's close() returned before the daemon sender delivered /close, leaving the server-side multipart upload unfinalized — workflow tail (Execution Time Summary, Workflow Summary, "Workflow finished in time") never reached S3, only partial.txt did. Redesign: - Single Queue + single daemon worker; threading.Event for control plane instead of a sentinel on the data queue. - flush(timeout) / shutdown(timeout) are synchronous: caller blocks until queued records are delivered or deadline. - atexit hook covers callers that forget to shut down. - Shutdown ships a final metrics line + synchronous /close POST. On join timeout, force-stops the worker and retries on a fresh REST so /close is never starved by an in-flight POST. - log_server_version() decoupled from validate_versions() so the "client running" line is captured by the handler instead of being emitted before setup. Counters (shipped, failed, dropped, timed_out) are logged at shutdown so silent loss becomes loud. * fix(logging): close flush() TOCTOU between dequeue and POST start flush() polled (buffer.empty() AND not _post_in_flight). The worker set _post_in_flight inside _post_batch, which left a microsecond gap after buffer.get() returned (queue empty) but before _post_batch ran (flag still clear). A flush() that landed in that gap could conclude "drained" while the worker held a batch about to ship. Not a data-loss bug in our caller (shutdown() always follows flush() with worker.join()), but flush() is public and the contract should hold standalone. Set _post_in_flight inside _collect_batch immediately after the first successful get(); the gap shrinks to one Python line. The flag stays clear while the worker idles in buffer.get(timeout=...), so flush() still returns fast when there's no work. * refactor(logging): name magic timeouts in streamable handler Hardcoded numbers in flush() and shutdown() were not self-documenting and tied callers to undocumented invariants. Promoted to named class constants: - FLUSH_DEFAULT_SEC (5.0): flush() default deadline. - FLUSH_POLL_SEC (0.05): how often flush() rechecks state. - FORCE_STOP_JOIN_SEC (2.0): secondary worker join after force-stop. flush(timeout) now uses the Optional[None] sentinel pattern matching shutdown(timeout). Added a comment to the poll-sleep explaining the two-conditions-AND requirement that motivates it. * fix(logging): close worker/main _client race + mock log_server_version Two issues surfaced by the PR review: 1. Workflow.__init__ now calls metadata.log_server_version(), which hits GET /system/version when not mocked. conftest pre-mocks the sibling validate_versions; add log_server_version so unit tests don't blow up on collection. 2. After force-stop, shutdown() reassigned self._client = REST(...). The worker's finally block (which closes self._client) could then close the freshly-created client under the main thread's metrics and /close POSTs. Use a local 'post_close_client' instead so the worker's session and the main thread's never alias. Also document on shutdown(timeout=...) that the post-stop metrics and /close POSTs carry their own HTTP timeouts, so total wall-time can be roughly timeout + 32s in pathological cases.

…8260) * feat(mysql): support custom queryHistoryTable for usage & lineage MySQL lineage/usage extraction reads query history from one of two hardcoded tables: `mysql.general_log` (default) and `mysql.slow_log` (when `useSlowLogs=true`). Many deployments don't grant ingestion users access to either of these system tables and instead replicate query history into a custom table or view, leaving the connector with no way to point at the replacement. Add an optional `queryHistoryTable` field to MysqlConnection. When set, it overrides the table read by the lineage/usage SQL and by the test-connection probe — for both the general-log and slow-log paths. Column expectations (`argument` vs `sql_text`, `event_time` vs `start_time`) still follow `useSlowLogs`, so the custom table must expose columns compatible with the selected path. Mirrors the existing `queryHistoryTable` pattern in the Trino connector (parameterized `{query_history_table}` placeholder in the SQL template, resolved at format time from the service connection). Fixes #28089 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update generated TypeScript types * fix(mysql): add queryHistoryTable to SDK builder example The exhaustive MysqlConnection construction in the SDK builder example enumerates every connection field; basedpyright (run with --baselinemode=discard) flagged the newly added queryHistoryTable as a missing constructor argument. Pass it explicitly as None to keep the example exhaustive and the static-checks job green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…27987) * fix(alerts): make matchAnyEntityFqn match FQNs literally, not as regex * fix(alerts): strict literal matching across alert filter functions * fix(policy): AST-based SpEL validation so string-literal content is not parsed as syntax * fix(data-insight): evaluate chart formulas outside the policy/alert validator * test(playwright): ObservabilityAlerts must pick filter options by FQN, not bare name * test(playwright): focus combobox before fill in alert action input loop * test(playwright): extend inputValueId pattern to alert action inputs * refactor(alerts): move UUID regex to constants and use size variable

* fix(ometa): use requests instead of httpx for SSE transport In some enterprise environments httpx's HTTP/1.1 wire pattern is reset by network security middleboxes that pass requests/wget through. The symptom is httpx.ReadError: [Errno 104] Connection reset by peer at _receive_response_headers, with no server-side access log line because the connection is reset before any HTTP response is committed. requests and wget succeed against the same endpoint with the same JWT and body. The SDK's REST client already uses requests, so this aligns the SSE path with the proven transport. Behavior preserved: same retry loop, same SSE parser, same Last-Event-ID resume on retry, same log lines. Drive-by fix: the previous implementation built opts = { ..., allow_redirects, verify, cookies, ... } but never passed those keys to httpx.stream(...) - a latent bug. requests.Session.request now wires them up properly (defaults: verify=True, allow_redirects=True when unset). No new dependency: requests is already used by the SDK's REST client. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(ometa): drop noisy comment, expose timeout in SSE stream() - Remove the fix-rationale comment from the request_kwargs block; that context lives in the PR description and would rot in-tree. - Add an optional timeout parameter to SSEClient.stream(), forwarded to requests. Defaults to None (no timeout, prior behavior). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style(ometa): apply ruff formatting to sse_client Output of `make py_format` after the previous commits in this branch: ruff collapses a few multi-line expressions onto one line where they fit within the line width. No functional change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ometa): address PR review on SSE client Three follow-ups from Copilot's review on PR #28293: 1. Forward `ClientConfig.cert` to the SSE request so mTLS / client-cert setups work for SSE the same way they do for the REST client. Without this, deployments that authenticate the REST path via mutual TLS would silently fail on `/agents/dynamic/run` and similar SSE endpoints. 2. Document `data` and the new `timeout` parameter in the `stream()` docstring, including the SSE-specific note that the default of `None` disables timeouts because SSE streams can have long idle periods. 3. Migrate unit tests off `httpx` mocks: - `tests/unit/metadata/ingestion/ometa/test_sse_client.py` and `tests/unit/test_user_agent.py` now mock `requests.Session.request` instead of `httpx.Client.stream`. - `MockHTTPXClient` is replaced by `MockRequestsSession` that accepts the SDK's full kwargs set (method/url/headers/json/params/ stream/timeout/verify/allow_redirects/cookies/cert). - `MockSSEResponse.iter_lines` now accepts `decode_unicode` to match the call shape of the new code. - HTTP error / connection error fixtures use `requests.exceptions`. Not adopting Copilot's suggestion to fall back to `self.config.timeout` when the stream() arg is `None`: the REST client's default is `(10, 300)`, which would silently abort SSE streams that have legitimate idle gaps longer than 5 minutes (LLM "thinking" pauses, sparse update events). Keeping the SSE default at `None` (no timeout) preserves prior behavior and matches the working-everywhere baseline; callers can opt in. All 56 tests in the two updated files pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…1.12.9 (#28238) * fix(migration): preempt PDTS duplicates and recover invalid index in 1.12.9 CREATE UNIQUE INDEX CONCURRENTLY aborts when it hits existing duplicate keys but leaves an invalid index behind. On migration retry, IF NOT EXISTS no-ops successfully and gets checksum-logged, after which ADD CONSTRAINT USING INDEX fails permanently with "index ... is not valid". Hit at a customer with two duplicate table.systemProfile rows on a 10M-row PDTS. Adds two idempotent statements before the existing constraint build: - DO block: drops the invalid index and clears its migration-log entry when indisvalid=false. No-op on fresh DBs and on already-migrated environments (where the index is valid and owned by the constraint). - DELETE: collapses duplicate rows via single hash aggregate on the 4-column key + targeted self-join. Reads only key columns (no json scan), only touches rows in actual duplicate groups, no-op on clean DBs. Efficient on multi-million-row PDTS tables. Existing CREATE INDEX / ALTER TABLE / ANALYZE statements byte-identical so checksum-matched skips for already-migrated environments still apply. * fix(migration): self-heal in one pass + schema-scoped invalid-index probe Addresses Copilot review on #28238: 1. One-pass self-heal. MigrationFile.parseSQLFiles filters already-logged statements at parse time (MigrationFile.java:83). Clearing the CREATE log entry from inside a DO block doesn't bring CREATE back into the current pass's execution list — it would only re-run on the next migration cycle, leaving the same-pass ALTER to fail again. Replace the "DROP + clear log" pattern with "DROP + rebuild inline" so a valid index exists before ALTER runs in the same pass. Inline rebuild uses non-concurrent CREATE UNIQUE INDEX, which takes a brief ACCESS EXCLUSIVE lock on the table. Acceptable because this path fires only when the environment is already in a degraded state. Normal-path customers go through the CONCURRENTLY build below. 2. Schema-scoped invalid-index probe. pg_class.relname is not schema-unique. Anchor the lookup via i.indrelid = 'profiler_data_time_series'::regclass and DROP by index OID (invalid_idx::regclass), so an invalid index with the same name in another schema cannot accidentally trigger this branch. Existing CREATE INDEX / ALTER TABLE / ANALYZE statements byte-identical to before this PR, so checksum-matched skips still apply for already-migrated environments. Test gap (Copilot's third comment) for the recovery scenario tracked as follow-up — existing migration tests in MigrationWorkflowReprocessingTest are mock-based; verifying recovery end-to-end needs Postgres integration infrastructure. * chore(migration): trim verbose comments in 1.12.9/postgres/schemaChanges.sql Statement bodies unchanged — checksums identical. Detailed mechanism write-ups live in the commit log / PR description; the file keeps just the load-bearing intent comment above each statement. * fix(migration): scope PDTS dedup to operation IS NOT NULL Addresses Copilot review on PR #28238 discussion r3264066840. Postgres UNIQUE treats NULLs as DISTINCT by default, so the constraint on (entityFQNHash, extension, operation, timestamp) permits multiple rows where operation IS NULL — i.e. table.tableProfile and table.columnProfile rows that share the other key columns. The previous dedup used GROUP BY which treats NULLs as equal, so it would have collapsed retry-induced tableProfile / columnProfile pairs that the restored constraint never actually blocked. Restricting the subquery to operation IS NOT NULL (plus a defensive entityFQNHash IS NOT NULL) aligns dedup with constraint semantics. DMG's customer rows were all table.systemProfile (operation = INSERT), so this still removes the customer dupes correctly. tableProfile / columnProfile retry duplicates — if they exist — stay as-is, which is the same outcome the unique constraint would produce on its own. * perf(migration): boost work_mem / maintenance_work_mem for PDTS dedup at scale Mirrors the tuning pattern from 1.9.9/postgres/postDataMigrationSQLScript.sql (same table, same operation class). On 50M-row PDTS the dedup DELETE's hash aggregate spills to disk with default work_mem=4MB, adding ~30-60s of disk I/O. Bumping work_mem to 256MB keeps the aggregate in memory; maintenance_work_mem=512MB lets CREATE UNIQUE INDEX CONCURRENTLY sort in memory too. Session-level (not SET LOCAL) because schemaChanges runs in autocommit (CREATE INDEX CONCURRENTLY requires it) — SET LOCAL would reset between statements. RESET at the end of the file restores defaults before the connection returns to the Hikari pool. Expected runtime impact at customer scale: 20M rows: ~30s tuned vs ~40s default 50M rows: ~40s tuned vs ~90s default (avoids spill) * chore(migration): trim comments on PDTS dedup additions * chore(migration): drop 1.9.9 reference from mem comment

…tityStats (#28264) * fix(data-insights): resolve owner by id in processTeam to handle bare historical refs * refactor(data-insights): isolate enricher step failures + per-step EntityStats * test(data-insights): add enricher behavior IT + rename equivalence IT * refactor(data-insights): extract VersionResolver + SnapshotMaterializer * refactor(data-insights): extract enricher concerns into step classes + bounded owner cache + rate-limited loss warns * refactor(data-insights): apply review feedback — tighten enricher docs, drop dead code * docs(data-insights): drop unsupported 'intentional' claim from TierStep prefix-match docstring * fix(data-insights): guard VersionResolver against null updatedAt on historical version rows * refactor(data-insights): tighten VersionResolver walk — consume isFirst at top, drop historyDone flag

* feat(ui): allow admins to set default landing-page panel color Adds an optional panelBackgroundColor field to ThemeConfiguration and a matching color picker on the admin Theme settings page. The landing page welcome panel now falls back to this admin-configured color when no per-user or per-persona override exists, preserving existing customization.

…28228) * fix(dq-dashboard): tier filter queries wrong ES field, showing 0 in testCase/result/resolutionStatus widgets Tier is indexed in a dedicated `tier.tagFQN` field (or `testCase.tier.tagFQN` for derived indices), not in the `tags[]` array. Several fetch functions were merging tier FQNs into the tags array and routing them through `buildMustEsFilterForTags`, which emits a `nested: { path: 'tags' }` query that never matches a tier-tagged doc. - Add `buildMustEsFilterForTier(tiers, isTestCaseResult)` in DataQualityUtils — emits a flat `term: { 'tier.tagFQN' }` or `term: { 'testCase.tier.tagFQN' }` - Fix `buildDataQualityDashboardFilters` !isTableApi branch to route tier and tags through separate filters - Fix all four bypass functions in dataQualityDashboardAPI.ts: `fetchTestCaseSummaryByNoDimension`, `fetchCountOfIncidentStatusTypeByDays`, `fetchIncidentTimeMetrics`, `fetchTestCaseStatusMetricsByDays` - Fix TagPage data_observability tab: Tier tags now pass `tier` in initialFilters and hide the tier dropdown, instead of incorrectly using the tags filter Fixes open-metadata/openmetadata-collate#4086 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix failing unit tests --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…28305)

…versions conf/openmetadata.yaml line 720 was reading ${RDF_ENDPOINT}, but every docker-compose stack and deployment doc has always exported RDF_REMOTE_ENDPOINT. The mismatch was silently falling back to the localhost default, so any non-local Fuseki configuration was ignored and the server crashed trying to push the ontology to http://localhost:3030. Realign the placeholder with the long-standing env var name. Also extend MigrationWorkflow.validateMigrationsForServer to include the list of pending versions in the IllegalStateException message. The bare "there are pending migrations" wording forced operators to add their own logging to find out which versions were actually outstanding; listing them inline cuts straight to the failing migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-20T21:19:51Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

sonarqubecloud · 2026-05-20T22:30:13Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
80.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

# Conflicts: # .github/workflows/airflow-apis-tests.yml # conf/openmetadata.yaml # ingestion/src/metadata/ingestion/source/database/mysql/connection.py # ingestion/src/metadata/utils/streamable_logger.py # ingestion/src/metadata/workflow/base.py # ingestion/tests/unit/utils/test_streamable_logger.py # openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/AlertsRuleEvaluatorResourceIT.java # openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/DefaultToolContext.java # openmetadata-service/src/main/java/org/openmetadata/service/events/subscription/AlertsRuleEvaluator.java # openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java # openmetadata-service/src/main/java/org/openmetadata/service/resources/context/ContextMemoryResource.java # openmetadata-service/src/main/java/org/openmetadata/service/security/policyevaluator/ExpressionValidator.java # openmetadata-ui/src/main/resources/ui/playwright/e2e/Features/DataQuality/DataQualityDashboard.spec.ts # openmetadata-ui/src/main/resources/ui/playwright/e2e/Flow/ObservabilityAlerts.spec.ts # openmetadata-ui/src/main/resources/ui/playwright/utils/dataQuality.ts # openmetadata-ui/src/main/resources/ui/src/components/MyData/CustomizableComponents/CustomiseLandingPageHeader/CustomiseLandingPageHeader.test.tsx # openmetadata-ui/src/main/resources/ui/src/components/OntologyExplorer/hooks/useGraphData.ts # openmetadata-ui/src/main/resources/ui/src/components/OntologyExplorer/hooks/useOntologyGraph.ts # openmetadata-ui/src/main/resources/ui/src/locale/languages/ar-sa.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/de-de.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/en-us.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/es-es.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/fr-fr.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/gl-es.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/he-he.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/ja-jp.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/ko-kr.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/mr-in.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/nl-nl.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/pr-pr.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-br.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-pt.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/ru-ru.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/th-th.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/tr-tr.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-cn.json # openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-tw.json # openmetadata-ui/src/main/resources/ui/src/rest/dataQualityDashboardAPI.test.ts # openmetadata-ui/src/main/resources/ui/src/rest/dataQualityDashboardAPI.ts # openmetadata-ui/src/main/resources/ui/src/utils/Alerts/AlertsUtil.tsx # openmetadata-ui/src/main/resources/ui/src/utils/CustomizePage/CustomizePageUtils.test.ts # openmetadata-ui/src/main/resources/ui/src/utils/DataQuality/DataQualityUtils.tsx # openmetadata-ui/src/main/resources/ui/src/utils/StringUtils.test.ts

github-actions · 2026-06-17T20:48:52Z

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

github-actions · 2026-06-17T20:51:14Z

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gitar-bot · 2026-06-17T20:55:21Z

Code Review ⚠️ Changes requested 7 resolved / 9 findings

Implements extensive knowledge-graph features including a SPARQL playground, MCP tools, and inference registry while deprecating legacy R2RML materialization. Resolves multiple injection and validation vulnerabilities, though the CustomOntologyValidator requires a fix for class forward-references and the SPARQL UPDATE logic lacks proper URI sanitization.

⚠️

Bug: CustomOntologyValidator rejects valid forward-references between classes

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java:106-109 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java:151 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java:157-158

The validateClass method checks whether a parent class URI exists in seenUris (line 158), but seenUris is populated incrementally as classes are processed in list order (line 151). If class B declares class A as its subClassOf parent, but A appears after B in the classes list, the validator will incorrectly report "references unknown parent class" — even though both classes are declared in the same extension.

This makes validation order-dependent and will reject valid ontology extensions where the author didn't happen to declare classes in topological order. The cycle-detection method (detectClassHierarchyCycles) correctly builds the full graph before traversing, so it's not affected — only the parent-existence check is broken.

Pre-collect all class URIs before validating parent references, so declaration order doesn't matter. Change the duplicate-URI detection inside validateClass to use a separate seen-set or check before adding.

// First pass: collect all declared class URIs
Set<String> classUris = new HashSet<>();
for (CustomOntologyClass cls : classes) {
  if (cls != null && cls.getUri() != null && !cls.getUri().isBlank()) {
    classUris.add(cls.getUri());
  }
}
// Second pass: validate each class against the full set
for (CustomOntologyClass cls : classes) {
  validateClass(cls, classUris, errors);
}

💡 Security: SPARQL UPDATE built via string concat without URI validation

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CentralityComputation.java:146-157 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:214-218

In CentralityComputation.persistScores and CommunityComputation.persistCommunities, entity URIs obtained from SPARQL SELECT results are concatenated directly into SPARQL UPDATE strings using <uri> syntax without any IRI validation. While valid RDF IRIs cannot contain > (so Jena-stored URIs should be safe), this is a defense-in-depth gap: if a malformed URI ever enters the triplestore (e.g., via a custom ontology upload or a bug in another component), it could break out of the <…> brackets and inject arbitrary SPARQL UPDATE operations.

The codebase already provides RdfIriValidator.validateEntityIri() for this purpose but it's not used in the insights package.

Add a basic IRI sanity check (or use RdfIriValidator) before concatenating URIs into the UPDATE string. Apply similarly in CommunityComputation.persistCommunities.

// In CentralityComputation.persistScores, before appending:
for (Map.Entry<String, Double> e : sorted) {
  String uri = e.getKey();
  if (uri == null || uri.contains(">") || uri.contains(" ") || uri.contains("
")) {
    continue; // skip malformed URIs
  }
  update.append("  <").append(uri).append("> ...");
}

✅ 7 resolved

✅ Security: SPARQL injection via insufficient URI sanitization in validateGraph

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:197-200 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:202-205 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:199 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:203
The validateGraph endpoint (line 199) sanitizes user-supplied entityUri by only stripping > characters before interpolating into a SPARQL query: String.format("DESCRIBE <%s>", entityUri.replace(">", "")). This is insufficient — an attacker can inject SPARQL via newlines, # comments, or other control characters. For example, entityUri = "http://x> . } DELETE WHERE { ?s ?p ?o } #" would close the angle bracket context on a newline. Since this is an admin-only endpoint the blast radius is somewhat limited, but it's still a direct injection vector against the triplestore.

The same pattern appears in CommunityComputation.persistCommunities (line 203) where member URIs from SPARQL results are directly interpolated into SPARQL UPDATE statements — if a malicious triple exists in the graph, it could inject arbitrary updates.

✅ Bug: CONSTRUCT query flattens multi-hop neighborhood onto single subject

📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/EntityNeighborhoodTool.java:102-116
In EntityNeighborhoodTool.buildConstructQuery (line 102-121), the WHERE clause captures ?p ?o triples from 1-3 hops away, but then does BIND(<entityUri> AS ?s) and the CONSTRUCT template is { ?s ?p ?o }. This means ALL matched triples — even those 2-3 hops from the start entity — are emitted with the start entity as subject, producing an incorrect/flattened graph. For depth≥2, triples from intermediate nodes lose their actual subject and get attributed to the root entity.

The tool's purpose is to return the neighborhood subgraph, but consumers will see a star graph rather than the actual topology.

✅ Performance: Double symmetrization quadruples edge weights in Louvain input

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:161-162 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/Louvain.java:128-129
CommunityComputation.parseGraph (lines 161-162) already symmetrizes the adjacency list by adding weight in both directions for every SPARQL result row. Then Louvain.addAllEdges (lines 128-129) symmetrizes again by doing adj.get(src).merge(dst, w, Double::sum) AND adj.get(dst).merge(src, w, Double::sum). Since the input already has both A→B and B→A entries, the result is each edge getting 4× its original weight.

While this may not change community assignments (the ratio is preserved in the modularity gain formula), it inflates totalWeight and degree[] arrays by 4×, making the absolute modularity score meaningless for comparison with other runs or standard benchmarks. It also doubles memory usage for the adjacency structure.

✅ Quality: 14 fully-qualified class names in validateGraph method

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:203-217
The validateGraph method uses inline fully-qualified names (org.apache.jena.rdf.model.Model, org.apache.jena.riot.RDFDataMgr, java.io.ByteArrayOutputStream, etc.) instead of imports. This violates the project's coding standards ('No fully qualified names') and makes the method hard to read. There are 14 FQN usages in a single 70-line method.

✅ Bug: SparqlFederationGuard race condition on lazy init

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:113-120
In RdfResource.java lines 113-120, getFederationGuard() has a lazy-init pattern without synchronization. The field is volatile, but the check-then-assign pattern is not atomic — two threads could both see null and both create a new instance. While the consequences here are benign (both would be functionally equivalent), the field is already initialized in initialize() (line 110), so the lazy fallback is only for tests. The inconsistency with the double-checked locking pattern used for getSemanticSearchEngine() is confusing.

...and 2 more resolved from earlier reviews

🤖 Prompt for agents

Code Review: Implements extensive knowledge-graph features including a SPARQL playground, MCP tools, and inference registry while deprecating legacy R2RML materialization. Resolves multiple injection and validation vulnerabilities, though the CustomOntologyValidator requires a fix for class forward-references and the SPARQL UPDATE logic lacks proper URI sanitization.

1. ⚠️ Bug: CustomOntologyValidator rejects valid forward-references between classes
   Files: openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java:106-109, openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java:151, openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyValidator.java:157-158

   The `validateClass` method checks whether a parent class URI exists in `seenUris` (line 158), but `seenUris` is populated incrementally as classes are processed in list order (line 151). If class B declares class A as its `subClassOf` parent, but A appears *after* B in the `classes` list, the validator will incorrectly report "references unknown parent class" — even though both classes are declared in the same extension.
   
   This makes validation order-dependent and will reject valid ontology extensions where the author didn't happen to declare classes in topological order. The cycle-detection method (`detectClassHierarchyCycles`) correctly builds the full graph before traversing, so it's not affected — only the parent-existence check is broken.

   Fix (Pre-collect all class URIs before validating parent references, so declaration order doesn't matter. Change the duplicate-URI detection inside validateClass to use a separate seen-set or check before adding.):
   // First pass: collect all declared class URIs
   Set<String> classUris = new HashSet<>();
   for (CustomOntologyClass cls : classes) {
     if (cls != null && cls.getUri() != null && !cls.getUri().isBlank()) {
       classUris.add(cls.getUri());
     }
   }
   // Second pass: validate each class against the full set
   for (CustomOntologyClass cls : classes) {
     validateClass(cls, classUris, errors);
   }

2. 💡 Security: SPARQL UPDATE built via string concat without URI validation
   Files: openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CentralityComputation.java:146-157, openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:214-218

   In `CentralityComputation.persistScores` and `CommunityComputation.persistCommunities`, entity URIs obtained from SPARQL SELECT results are concatenated directly into SPARQL UPDATE strings using `<uri>` syntax without any IRI validation. While valid RDF IRIs cannot contain `>` (so Jena-stored URIs should be safe), this is a defense-in-depth gap: if a malformed URI ever enters the triplestore (e.g., via a custom ontology upload or a bug in another component), it could break out of the `<…>` brackets and inject arbitrary SPARQL UPDATE operations.
   
   The codebase already provides `RdfIriValidator.validateEntityIri()` for this purpose but it's not used in the insights package.

   Fix (Add a basic IRI sanity check (or use RdfIriValidator) before concatenating URIs into the UPDATE string. Apply similarly in CommunityComputation.persistCommunities.):
   // In CentralityComputation.persistScores, before appending:
   for (Map.Entry<String, Double> e : sorted) {
     String uri = e.getKey();
     if (uri == null || uri.contains(">") || uri.contains(" ") || uri.contains("
   ")) {
       continue; // skip malformed URIs
     }
     update.append("  <").append(uri).append("> ...");
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

greptile-apps · 2026-06-17T21:05:08Z

+    return uri != null && uri.startsWith(EXTENSION_NS);
+  }
+
+  /**
+   * @return true if the URI references either a canonical om: class on the allowlist or a class
+   *     declared inside the current extension.
+   */
+  private static boolean isKnownClassReference(String uri, Set<String> extensionClassUris) {
+    if (uri == null) return false;
+    if (extensionClassUris.contains(uri)) return true;
+    if (uri.startsWith(CANONICAL_NS)) {
+      String localName = uri.substring(CANONICAL_NS.length());
+      return KNOWN_CANONICAL_CLASSES.contains(localName);
+    }
+    // Allow short-form prefixed names like "om:Table".
+    if (uri.startsWith("om:")) {
+      return KNOWN_CANONICAL_CLASSES.contains(uri.substring(3));
+    }
+    return false;
+  }
+


hasCycle leaves onStack dirty, causing false-positive cycle errors

The onStack set is shared across all outer-loop iterations but is never cleared when a cycle is detected early. When hasCycle(A) finds a cycle (A → B → A) and returns true, it unwinds the stack without calling onStack.remove() for any node along the path. Nodes A and B remain in onStack. If a subsequent outer-loop call processes a node D that has an edge to B (D → B), hasCycle(B) returns true immediately because B is still in onStack — even though D is not part of any cycle. This falsely rejects valid ontology extensions with "class hierarchy contains a cycle through D".

The fix is to clear onStack before each outer-loop call (the visited set correctly prevents re-processing, so only onStack needs resetting).

greptile-apps · 2026-06-17T21:05:09Z

+  const handleSaveCurrent = useCallback(() => {
+    const name = window.prompt(t('label.sparql-save-prompt'));
+    if (!name || !name.trim()) {
+      return;
+    }


window.prompt() is a native blocking dialog that is suppressed in most embedded or iframe contexts, returns null when dismissed without input, and is not accessible on mobile. Consider replacing with an inline rename input or a modal form consistent with the rest of the UI; it would also allow pre-validation of the name before saving.

Suggested change

const handleSaveCurrent = useCallback(() => {

const name = window.prompt(t('label.sparql-save-prompt'));

if (!name || !name.trim()) {

return;

}

const [saveDialogOpen, setSaveDialogOpen] = useState(false);

const handleSaveCurrent = useCallback(() => {

setSaveDialogOpen(true);

}, []);

// Replace the prompt with a controlled modal/input in the JSX and wire

// the confirmed name through a separate callback.

const commitSave = useCallback((name: string) => {

if (!name || !name.trim()) {

return;

}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-17T21:05:10Z

+      throw new IllegalArgumentException(label + " contains illegal characters");
+    }
+    URI parsed;
+    try {
+      parsed = new URI(trimmed);
+    } catch (URISyntaxException e) {
+      throw new IllegalArgumentException(label + " is not a valid URI: " + e.getMessage());
+    }
+    if (!parsed.isAbsolute()) {
+      throw new IllegalArgumentException(label + " must be an absolute URI");
+    }
+    String scheme = parsed.getScheme();
+    if (!"http".equalsIgnoreCase(scheme) && !"https".equalsIgnoreCase(scheme)) {
+      throw new IllegalArgumentException(
+          label + " must use http or https scheme (got: " + scheme + ")");
+    }
+    return trimmed;
+  }
+
+  /** Clamp maxHops to [1, HARD_MAX_HOPS]; null or < 1 falls back to DEFAULT_MAX_HOPS. */
+  public static int clampMaxHops(Integer requested) {
+    if (requested == null || requested < 1) return DEFAULT_MAX_HOPS;


validateNodeUri doesn't reuse the shared RdfIriValidator

RdfIriValidator.validateEntityIri was introduced in this PR specifically to provide a single, reusable IRI validator. validateNodeUri reimplements the same checks but misses a few characters that RdfIriValidator also blocks: ", ', `, and control characters below 0x20. For angle-bracket SPARQL IRI literals (<uri>), only > must be blocked to prevent injection, so the current behaviour is safe — but keeping two divergent validators risks them drifting apart over time. Consider delegating to RdfIriValidator.validateEntityIri and translating null to an IllegalArgumentException.

greptile-apps · 2026-06-17T21:05:11Z

+    if (errors.size() > 0) {
+      return errors;
+    }


Minor style: errors.size() > 0 is less idiomatic than !errors.isEmpty(). The latter is also marginally more efficient for certain List implementations.

Suggested change

if (errors.size() > 0) {

return errors;

}

if (!errors.isEmpty()) {

return errors;

}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

sonarqubecloud · 2026-06-17T22:06:40Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings May 11, 2026 18:29

harshach requested review from a team, akash-jain-10 and tutte as code owners May 11, 2026 18:29

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 11, 2026

Copilot started reviewing on behalf of harshach May 11, 2026 18:30 View session

gitar-bot Bot reviewed May 11, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java Outdated

gitar-bot Bot reviewed May 11, 2026

View reviewed changes

Comment thread openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/EntityNeighborhoodTool.java Outdated

gitar-bot Bot reviewed May 11, 2026

View reviewed changes

Comment thread ...tadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java Outdated

gitar-bot Bot reviewed May 11, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java Outdated

gitar-bot Bot reviewed May 11, 2026

View reviewed changes

Comment thread openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java Outdated

Copilot AI reviewed May 11, 2026

View reviewed changes

harshach temporarily deployed to test May 11, 2026 18:52 — with GitHub Actions Inactive

harshach had a problem deploying to test May 11, 2026 18:52 — with GitHub Actions Error

Copilot AI review requested due to automatic review settings May 11, 2026 19:48

Copilot started reviewing on behalf of harshach May 11, 2026 19:49 View session

gitar-bot Bot reviewed May 11, 2026

View reviewed changes

Comment thread openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/ShaclValidateTool.java Outdated

Copilot AI reviewed May 11, 2026

View reviewed changes

harshach temporarily deployed to test May 11, 2026 19:58 — with GitHub Actions Inactive

pmbrull and others added 15 commits May 20, 2026 14:15

fix(ui): return 404 for unknown service category instead of fallback …

0fe4d81

…to databases (#28288) * fix(ui): return 404 for unknown service category instead of fallback to databases * fix ui checkstyle

fix: correct typos (#26784)

3737c8e

fix: Custom Relation badge always taking default color (#28301)

5114baf

fix(ui): accept ServiceCategory enum values as :tab in ServicesPage (#…

5711162

…28305)

harshach had a problem deploying to test May 20, 2026 21:26 — with GitHub Actions Failure

harshach temporarily deployed to test May 20, 2026 21:29 — with GitHub Actions Inactive

Fix Java checkstyle

a5b6ef9

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

Conversation

harshach commented May 11, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Type of change

Tests

UI screen recording / screenshots

Summary by Gitar

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026

✅ TypeScript Types Auto-Updated

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

sonarqubecloud Bot commented May 20, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ PR checklist incomplete

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

gitar-bot Bot commented Jun 17, 2026

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Jun 17, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

harshach commented May 11, 2026 •

edited by greptile-apps Bot

Loading

github-actions Bot commented Jun 17, 2026 •

edited

Loading