You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ask Ontos handbook — deferred follow-ups from PR #472 review
Body
Captures three items raised by @larsgeorge-db during the review of #472 (PR 1 of the Ask Ontos uplift) that we intentionally deferred from that PR to keep the scope focused. Filing here so they aren't lost.
1) Encode the handbook corpus as RDF / extend the Ontos ontology
@larsgeorge-dbsuggested that the handbook content might better live as an RDF-based extension on top of the Ontos ontology — making it SPARQL-queryable alongside customer ontologies, and dogfooding Ontos's own modeling story.
Why it's interesting
Conceptually elegant: Ontos is an ontology platform, so encoding its own self-description as triples is real dogfooding.
SPARQL-queryable from any consumer, not just Ask Ontos. A "tell me everything Ontos knows about contracts" query becomes possible alongside customer-ontology queries.
After the concepts → handbook rename freed up the "Concept" namespace, this is now an architecturally clean path.
Why it's deferred
Markdown has properties RDF doesn't fit well for the current corpus shape:
Narrative flow (the DQX walkthrough is 6 sequential steps — modeling that as triples loses readability).
Easy to author in any text editor, easy to review in a PR diff by non-RDF-literate contributors.
Headings give the LLM natural sections; triples lack inherent structure for prose.
The current corpus is 14 files. Conversion overhead is high vs. the marginal queryability benefit at this scale.
Triggers to revisit
Corpus passes ~50 docs and grep produces too many false-positive hits.
A customer wants to SPARQL-query "what does Ontos know about X" alongside their own ontology.
We need to render the same fact in multiple surfaces (UI tooltip + LLM context + API).
2) Embedding-based retrieval to replace grep
@larsgeorge-dbnoted that embeddings would unlock better recall for the handbook, and that #469 (Ontology Term Mapping — bulk suggest & apply) will need embedding infrastructure anyway.
Plan
When the [PRD]: Ontology Term Mapping (Bulk Suggest & Apply) #469 work wires up embeddings (likely a Databricks vector-search index or a local FAISS for self-contained mode), search_ontos_handbook becomes a free-rider on that infrastructure.
Embeddings replace the current grep ranking. Each section's body gets embedded; queries embed and similarity-search.
Bonus: cross-lingual retrieval improves naturally (German query → English doc match via shared embedding space). This indirectly addresses the multi-language gap noted below.
Until then
Grep is fine for the current 14-file corpus. Latency is single-digit ms, recall is acceptable, and the LLM can paraphrase raw markdown.
3) Full multi-locale translation of the handbook corpus
The handbook is English-only; the Ontos UI ships in 7 locales (en, de, es, fr, it, ja, nl). PR #472 added a prompt instruction telling the LLM to answer in the user's language but keep Ontos UI labels in English — that handles the common case but is not a substitute for native-language grounding material.
Why it's deferred
Translation + maintenance overhead is high (every handbook edit ×7 locales).
The LLM is reasonably good at on-the-fly translation from English source.
Title
Ask Ontos handbook — deferred follow-ups from PR #472 review
Body
Captures three items raised by @larsgeorge-db during the review of #472 (PR 1 of the Ask Ontos uplift) that we intentionally deferred from that PR to keep the scope focused. Filing here so they aren't lost.
1) Encode the handbook corpus as RDF / extend the Ontos ontology
@larsgeorge-db suggested that the handbook content might better live as an RDF-based extension on top of the Ontos ontology — making it SPARQL-queryable alongside customer ontologies, and dogfooding Ontos's own modeling story.
Why it's interesting
concepts→handbookrename freed up the "Concept" namespace, this is now an architecturally clean path.Why it's deferred
Triggers to revisit
2) Embedding-based retrieval to replace grep
@larsgeorge-db noted that embeddings would unlock better recall for the handbook, and that #469 (Ontology Term Mapping — bulk suggest & apply) will need embedding infrastructure anyway.
Plan
search_ontos_handbookbecomes a free-rider on that infrastructure.Until then
3) Full multi-locale translation of the handbook corpus
The handbook is English-only; the Ontos UI ships in 7 locales (en, de, es, fr, it, ja, nl). PR #472 added a prompt instruction telling the LLM to answer in the user's language but keep Ontos UI labels in English — that handles the common case but is not a substitute for native-language grounding material.
Why it's deferred
Triggers to revisit
Cross-links
This issue was written by Isaac.