#74, #75, #76#87
Merged
Merged
Conversation
…ator' into 75-create-slim-pipeline-orchestrator
Afif-del
approved these changes
Jun 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This went a little bit out of scope : )
What was Done:
Replace the flat "retrieve → prompt → stream" chat path with an agentic pipeline, adds native LLM tool-calling, and ships a CLI test client.
How it fits together
ChatOrchestrator (SSE wrapper)
└─ OrchestratorAgent ← routes the question
└─ AgentTool("synthesis") ← exposes a sub-agent as a callable tool
└─ SynthesisAgent ← answers from the knowledge base
├─ RetrieveTool ├─ GrepTool └─ FetchFileTool
Every Agent runs the same two phases: gather (call tools until it has enough) then answer (synthesize a reply). Tools
are the leaves that touch data; an AgentTool lets one agent call another as just another tool.
Agents (src/agents/)
running returned tool calls until it stops; answer_stream synthesizes the final reply. Emits an Invocation per
tool/sub-agent used (drives tool_use events). User query fenced with a per-request random marker (injection guard).
Streams a single delegation's answer straight through, or synthesizes once from multiple delegations' summaries.
strictly from gathered sources.
token → citation → done).
Tools (src/agents/tools/)
JSON-schema specs to the LLM.
(sub-agent gathers now, synthesizes only if needed).
LLM clients (src/llm/)
and Ollama (replaces the JSON workaround). Ollama tool-call IDs are uuid4-unique.
API & CLI
ChatRequest no longer takes top_k/min_score (the pipeline owns retrieval depth). New scripts/chat_cli.py terminal
client (ask / ingest / history).
Tests
deterministically; added OpenAI/Ollama tool-calling coverage.
Type of PR — pick one:
1 Process baseline — every box must be ticked, on every PR
TODO/FIXMEwithout a follow-up issue2 Outcome proof — fill the bullets that match your PR type
Functional / Mixed PR:
e.g. upload a markdown file, then assert the chat answer cites it
Non-functional / Mixed PR:
e.g. benchmark output for
chat p95 < 2 s, axe audit fora11y ≥ 90, scan report for0 critical CVEs3 Cross-cutting impact — tick the areas this PR touches
For each area below, ask: "does my PR touch this?"
-> If yes → tick the area and complete its sub-checks (they become mandatory).
-> If no → skip it.
-> If nothing applies → tick "None of the above".
UI touched
e.g. new button has a focus ring and is readable on a 360 px screen
Backend endpoint added or changed
e.g.
/uploadrejects files > 10 MB with a clear errorStores user data, ingested content, or LLM output
New env var or config
.env.exampleand mentioned inREADME.mde.g.
LLM_API_KEY=...— only the key name lives in.env.exampleDeployment / build changed
None of the above — purely internal change