Skip to content

[Improvement]: Ask Ontos handbook — deferred follow-ups from PR #472 review #489

Description

@mvkonchits-db

Title

Ask Ontos handbook — deferred follow-ups from PR #472 review

Body

Captures three items raised by @larsgeorge-db during the review of #472 (PR 1 of the Ask Ontos uplift) that we intentionally deferred from that PR to keep the scope focused. Filing here so they aren't lost.

1) Encode the handbook corpus as RDF / extend the Ontos ontology

@larsgeorge-db suggested that the handbook content might better live as an RDF-based extension on top of the Ontos ontology — making it SPARQL-queryable alongside customer ontologies, and dogfooding Ontos's own modeling story.

Why it's interesting

  • Conceptually elegant: Ontos is an ontology platform, so encoding its own self-description as triples is real dogfooding.
  • SPARQL-queryable from any consumer, not just Ask Ontos. A "tell me everything Ontos knows about contracts" query becomes possible alongside customer-ontology queries.
  • Reuses the existing RDF + ontology infrastructure (triple store, ontology import, semantic links).
  • After the conceptshandbook rename freed up the "Concept" namespace, this is now an architecturally clean path.

Why it's deferred

  • Markdown has properties RDF doesn't fit well for the current corpus shape:
    • Narrative flow (the DQX walkthrough is 6 sequential steps — modeling that as triples loses readability).
    • Easy to author in any text editor, easy to review in a PR diff by non-RDF-literate contributors.
    • Headings give the LLM natural sections; triples lack inherent structure for prose.
  • The current corpus is 14 files. Conversion overhead is high vs. the marginal queryability benefit at this scale.

Triggers to revisit

  • Corpus passes ~50 docs and grep produces too many false-positive hits.
  • A customer wants to SPARQL-query "what does Ontos know about X" alongside their own ontology.
  • We need to render the same fact in multiple surfaces (UI tooltip + LLM context + API).

2) Embedding-based retrieval to replace grep

@larsgeorge-db noted that embeddings would unlock better recall for the handbook, and that #469 (Ontology Term Mapping — bulk suggest & apply) will need embedding infrastructure anyway.

Plan

  • When the [PRD]: Ontology Term Mapping (Bulk Suggest & Apply) #469 work wires up embeddings (likely a Databricks vector-search index or a local FAISS for self-contained mode), search_ontos_handbook becomes a free-rider on that infrastructure.
  • Embeddings replace the current grep ranking. Each section's body gets embedded; queries embed and similarity-search.
  • Bonus: cross-lingual retrieval improves naturally (German query → English doc match via shared embedding space). This indirectly addresses the multi-language gap noted below.

Until then

  • Grep is fine for the current 14-file corpus. Latency is single-digit ms, recall is acceptable, and the LLM can paraphrase raw markdown.

3) Full multi-locale translation of the handbook corpus

The handbook is English-only; the Ontos UI ships in 7 locales (en, de, es, fr, it, ja, nl). PR #472 added a prompt instruction telling the LLM to answer in the user's language but keep Ontos UI labels in English — that handles the common case but is not a substitute for native-language grounding material.

Why it's deferred

Triggers to revisit

  • Customer evidence of friction (non-English users reporting that translated answers are off).
  • The Tier 2 embedding work above makes multi-locale retrieval easier (just embed each translated doc into the same vector space).

Cross-links

This issue was written by Isaac.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions