[Improvement]: Ask Ontos handbook — deferred follow-ups from PR #472 review

## Title

Ask Ontos handbook — deferred follow-ups from PR #472 review

## Body

Captures three items raised by @larsgeorge-db during the review of #472 (PR 1 of the Ask Ontos uplift) that we intentionally deferred from that PR to keep the scope focused. Filing here so they aren't lost.

## 1) Encode the handbook corpus as RDF / extend the Ontos ontology

@larsgeorge-db [suggested](https://github.com/databrickslabs/ontos/pull/472#discussion_r...) that the handbook content might better live as an RDF-based extension on top of the Ontos ontology — making it SPARQL-queryable alongside customer ontologies, and dogfooding Ontos's own modeling story.

**Why it's interesting**
- Conceptually elegant: Ontos *is* an ontology platform, so encoding its own self-description as triples is real dogfooding.
- SPARQL-queryable from any consumer, not just Ask Ontos. A "tell me everything Ontos knows about contracts" query becomes possible alongside customer-ontology queries.
- Reuses the existing RDF + ontology infrastructure (triple store, ontology import, semantic links).
- After the `concepts` → `handbook` rename freed up the "Concept" namespace, this is now an architecturally clean path.

**Why it's deferred**
- Markdown has properties RDF doesn't fit well for the current corpus shape:
  - Narrative flow (the DQX walkthrough is 6 sequential steps — modeling that as triples loses readability).
  - Easy to author in any text editor, easy to review in a PR diff by non-RDF-literate contributors.
  - Headings give the LLM natural sections; triples lack inherent structure for prose.
- The current corpus is 14 files. Conversion overhead is high vs. the marginal queryability benefit at this scale.

**Triggers to revisit**
- Corpus passes ~50 docs and grep produces too many false-positive hits.
- A customer wants to SPARQL-query "what does Ontos know about X" alongside their own ontology.
- We need to render the same fact in multiple surfaces (UI tooltip + LLM context + API).

## 2) Embedding-based retrieval to replace grep

@larsgeorge-db [noted](https://github.com/databrickslabs/ontos/pull/472#discussion_r...) that embeddings would unlock better recall for the handbook, and that #469 (Ontology Term Mapping — bulk suggest & apply) will need embedding infrastructure anyway.

**Plan**
- When the #469 work wires up embeddings (likely a Databricks vector-search index or a local FAISS for self-contained mode), `search_ontos_handbook` becomes a free-rider on that infrastructure.
- Embeddings replace the current grep ranking. Each section's body gets embedded; queries embed and similarity-search.
- Bonus: cross-lingual retrieval improves naturally (German query → English doc match via shared embedding space). This indirectly addresses the multi-language gap noted below.

**Until then**
- Grep is fine for the current 14-file corpus. Latency is single-digit ms, recall is acceptable, and the LLM can paraphrase raw markdown.

## 3) Full multi-locale translation of the handbook corpus

The handbook is English-only; the Ontos UI ships in 7 locales (en, de, es, fr, it, ja, nl). PR #472 added a prompt instruction telling the LLM to answer in the user's language but keep Ontos UI labels in English — that handles the common case but is not a substitute for native-language grounding material.

**Why it's deferred**
- Translation + maintenance overhead is high (every handbook edit ×7 locales).
- The LLM is reasonably good at on-the-fly translation from English source.
- Cheap-fix prompt rule (already shipped in #472) is likely sufficient for v1.

**Triggers to revisit**
- Customer evidence of friction (non-English users reporting that translated answers are off).
- The Tier 2 embedding work above makes multi-locale retrieval easier (just embed each translated doc into the same vector space).

## Cross-links

- PR #472 — initial Ask Ontos uplift (where this feedback originated)
- #469 — Ontology Term Mapping (Bulk Suggest & Apply) — the natural home for embedding infrastructure
- #280 — Ask Ontos roadmap parent

This issue was written by Isaac.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Improvement]: Ask Ontos handbook — deferred follow-ups from PR #472 review #489

Title

Body

1) Encode the handbook corpus as RDF / extend the Ontos ontology

2) Embedding-based retrieval to replace grep

3) Full multi-locale translation of the handbook corpus

Cross-links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Improvement]: Ask Ontos handbook — deferred follow-ups from PR #472 review #489

Description

Title

Body

1) Encode the handbook corpus as RDF / extend the Ontos ontology

2) Embedding-based retrieval to replace grep

3) Full multi-locale translation of the handbook corpus

Cross-links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions