RAG: include exploration history in section selection prompt
Context
In DoclingRAGAgent._rag_loop, the section selector receives only a set of visited refs. It does not see why each section was picked, whether it helped, or what was still missing.
Current prompt fragment in _select_section (rag.py:298-307):
f"Already consulted section refs: {sorted(visited) or 'none'}\n\n"
On transversal queries like "compare X and Y" or "what are the tradeoffs", this leads the model to revisit semantically close sections and burn iterations until max_iterations.
Proposal
Pass the existing iterations: list[RAGIteration] to _select_section and inject a short history into the prompt. RAGIteration already has section_ref, reason, can_answer. No new model needed.
History block:
history_text = "\n".join(
f" - {it.section_ref}: {it.reason} -> {'helpful' if it.can_answer else 'not helpful'}"
for it in iterations
) or "none"
The prompt then explicitly asks the model to base its next pick on what it has already learned and to avoid sections similar to ones already consulted.
No change to output prompts
All the fields we want to surface are already produced by the LLM today:
_select_section already returns reason via SectionSelection (rag.py:287)
_attempt_answer already returns can_answer and response via AnswerAttempt (rag.py:392)
_rag_loop already stores all of this in RAGIteration (rag.py:215)
The patch is read-only on the output side. We just feed the existing trace back into the next selection prompt. No new fields to ask the model for, no schema change in rag_models.py.
Trade-off on signal quality
reason and can_answer are self-reports from the same model that picked the section. They carry real noise:
reason is often a post-hoc rationalization driven by surface overlap with the outline
can_answer=false can mean "section was unhelpful" or "the model failed to extract from a useful section"
We accept this trade-off because the alternative (no memory at all) is worse, and because keeping the history short (reason + helpful/not helpful) limits how far a noisy step can mislead the next one.
If validation shows the noise dominates, follow-ups can: weight recent iterations only, cap history to the last N steps, or add a sanity reread.
Why a separate history block, not an annotated outline
Annotating outline_text inline (e.g. marking visited nodes) was considered. Rejected because it mixes two concerns in one structure (document shape vs exploration trace) and makes it harder to cap history length independently of the outline. The separate block is also closer in shape to existing RAGIteration records, which keeps the diff small.
Changes
In rag.py:
_select_section: add iterations param, replace the Already consulted line with the history block.
_rag_loop: pass iterations in the call (rag.py:170).
- Derive
visited from iterations (single source of truth, not optional).
Tests:
- Add a unit test for
_select_section with a mocked Mellea session, asserting the history block appears in the prompt at iteration N+1 and reflects prior RAGIteration content.
About 15 lines of production diff plus the test.
Validation plan
Fixed query set, evaluated before and after the patch.
- 3 buckets: localized answer, transversal, answer not in document
- at least 10 queries per bucket
- 3 runs per query (local backends are not strictly deterministic at temperature 0)
- report mean and standard deviation of:
- iterations to convergence
- convergence rate
- number of revisits to semantically close sections
Expected: no change on localized queries, fewer iterations on transversal ones, faster give-up when the answer is not in the document.
RAG: include exploration history in section selection prompt
Context
In
DoclingRAGAgent._rag_loop, the section selector receives only asetof visited refs. It does not see why each section was picked, whether it helped, or what was still missing.Current prompt fragment in
_select_section(rag.py:298-307):f"Already consulted section refs: {sorted(visited) or 'none'}\n\n"On transversal queries like "compare X and Y" or "what are the tradeoffs", this leads the model to revisit semantically close sections and burn iterations until
max_iterations.Proposal
Pass the existing
iterations: list[RAGIteration]to_select_sectionand inject a short history into the prompt.RAGIterationalready hassection_ref,reason,can_answer. No new model needed.History block:
The prompt then explicitly asks the model to base its next pick on what it has already learned and to avoid sections similar to ones already consulted.
No change to output prompts
All the fields we want to surface are already produced by the LLM today:
_select_sectionalready returnsreasonviaSectionSelection(rag.py:287)_attempt_answeralready returnscan_answerandresponseviaAnswerAttempt(rag.py:392)_rag_loopalready stores all of this inRAGIteration(rag.py:215)The patch is read-only on the output side. We just feed the existing trace back into the next selection prompt. No new fields to ask the model for, no schema change in
rag_models.py.Trade-off on signal quality
reasonandcan_answerare self-reports from the same model that picked the section. They carry real noise:reasonis often a post-hoc rationalization driven by surface overlap with the outlinecan_answer=falsecan mean "section was unhelpful" or "the model failed to extract from a useful section"We accept this trade-off because the alternative (no memory at all) is worse, and because keeping the history short (
reason + helpful/not helpful) limits how far a noisy step can mislead the next one.If validation shows the noise dominates, follow-ups can: weight recent iterations only, cap history to the last N steps, or add a sanity reread.
Why a separate history block, not an annotated outline
Annotating
outline_textinline (e.g. marking visited nodes) was considered. Rejected because it mixes two concerns in one structure (document shape vs exploration trace) and makes it harder to cap history length independently of the outline. The separate block is also closer in shape to existingRAGIterationrecords, which keeps the diff small.Changes
In
rag.py:_select_section: additerationsparam, replace theAlready consultedline with the history block._rag_loop: passiterationsin the call (rag.py:170).visitedfromiterations(single source of truth, not optional).Tests:
_select_sectionwith a mocked Mellea session, asserting the history block appears in the prompt at iteration N+1 and reflects priorRAGIterationcontent.About 15 lines of production diff plus the test.
Validation plan
Fixed query set, evaluated before and after the patch.
Expected: no change on localized queries, fewer iterations on transversal ones, faster give-up when the answer is not in the document.