Skip to content

RAG: include exploration history in section selection prompt #25

Description

@pjmalandrino

RAG: include exploration history in section selection prompt

Context

In DoclingRAGAgent._rag_loop, the section selector receives only a set of visited refs. It does not see why each section was picked, whether it helped, or what was still missing.

Current prompt fragment in _select_section (rag.py:298-307):

f"Already consulted section refs: {sorted(visited) or 'none'}\n\n"

On transversal queries like "compare X and Y" or "what are the tradeoffs", this leads the model to revisit semantically close sections and burn iterations until max_iterations.

Proposal

Pass the existing iterations: list[RAGIteration] to _select_section and inject a short history into the prompt. RAGIteration already has section_ref, reason, can_answer. No new model needed.

History block:

history_text = "\n".join(
    f"  - {it.section_ref}: {it.reason} -> {'helpful' if it.can_answer else 'not helpful'}"
    for it in iterations
) or "none"

The prompt then explicitly asks the model to base its next pick on what it has already learned and to avoid sections similar to ones already consulted.

No change to output prompts

All the fields we want to surface are already produced by the LLM today:

  • _select_section already returns reason via SectionSelection (rag.py:287)
  • _attempt_answer already returns can_answer and response via AnswerAttempt (rag.py:392)
  • _rag_loop already stores all of this in RAGIteration (rag.py:215)

The patch is read-only on the output side. We just feed the existing trace back into the next selection prompt. No new fields to ask the model for, no schema change in rag_models.py.

Trade-off on signal quality

reason and can_answer are self-reports from the same model that picked the section. They carry real noise:

  • reason is often a post-hoc rationalization driven by surface overlap with the outline
  • can_answer=false can mean "section was unhelpful" or "the model failed to extract from a useful section"

We accept this trade-off because the alternative (no memory at all) is worse, and because keeping the history short (reason + helpful/not helpful) limits how far a noisy step can mislead the next one.

If validation shows the noise dominates, follow-ups can: weight recent iterations only, cap history to the last N steps, or add a sanity reread.

Why a separate history block, not an annotated outline

Annotating outline_text inline (e.g. marking visited nodes) was considered. Rejected because it mixes two concerns in one structure (document shape vs exploration trace) and makes it harder to cap history length independently of the outline. The separate block is also closer in shape to existing RAGIteration records, which keeps the diff small.

Changes

In rag.py:

  1. _select_section: add iterations param, replace the Already consulted line with the history block.
  2. _rag_loop: pass iterations in the call (rag.py:170).
  3. Derive visited from iterations (single source of truth, not optional).

Tests:

  1. Add a unit test for _select_section with a mocked Mellea session, asserting the history block appears in the prompt at iteration N+1 and reflects prior RAGIteration content.

About 15 lines of production diff plus the test.

Validation plan

Fixed query set, evaluated before and after the patch.

  • 3 buckets: localized answer, transversal, answer not in document
  • at least 10 queries per bucket
  • 3 runs per query (local backends are not strictly deterministic at temperature 0)
  • report mean and standard deviation of:
    • iterations to convergence
    • convergence rate
    • number of revisits to semantically close sections

Expected: no change on localized queries, fewer iterations on transversal ones, faster give-up when the answer is not in the document.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions