Skip to content

feat: retrival detail add matched_count#422

Merged
e06084 merged 3 commits into
MigoXLab:devfrom
e06084:dev
Jun 8, 2026
Merged

feat: retrival detail add matched_count#422
e06084 merged 3 commits into
MigoXLab:devfrom
e06084:dev

Conversation

@e06084

@e06084 e06084 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances MTEB search adapter traces by attaching relevant documents (qrels) to distinguish between mapped and relevant matches, updating the Semantic Scholar backend documentation, and adding corresponding unit tests. A critical review comment points out that the implementation of _attach_relevant_docs will raise AttributeError on standard MTEB tasks when attempting to iterate over task.dataset and call .get() on Hugging Face Dataset objects, suggesting to retrieve qrels directly from task.relevant_docs instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread dingo/exec/retrieval.py
Comment on lines +174 to +182
for hf_subset, splits in getattr(task, "dataset", {}).items():
for hf_split, data_split in splits.items():
relevant_docs = data_split.get("relevant_docs", {})
model.set_relevant_docs(
task_name,
hf_split,
hf_subset,
relevant_docs,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of _attach_relevant_docs has two critical bugs that will cause it to crash on almost all MTEB retrieval tasks:

  1. AttributeError on standard tasks: For standard retrieval tasks (without subsets), task.dataset is a Hugging Face DatasetDict (mapping split names to Dataset objects). Calling getattr(task, "dataset", {}).items() will yield (hf_split, data_split). The inner loop then attempts to call splits.items() (which is data_split.items()). Since data_split is a Hugging Face Dataset object, this will raise AttributeError: 'Dataset' object has no attribute 'items'.
  2. AttributeError on Dataset.get: Even if the task has subsets and splits is a DatasetDict, data_split will be a Hugging Face Dataset object. Calling data_split.get("relevant_docs", {}) will raise AttributeError: 'Dataset' object has no attribute 'get' because Dataset does not have a .get() method.

Instead, you should retrieve the loaded qrels directly from task.relevant_docs (which is a standard dictionary of split -> qrels populated after task.load_data()), and get the subset name from task.hf_subset (defaulting to 'default').

Suggested change
for hf_subset, splits in getattr(task, "dataset", {}).items():
for hf_split, data_split in splits.items():
relevant_docs = data_split.get("relevant_docs", {})
model.set_relevant_docs(
task_name,
hf_split,
hf_subset,
relevant_docs,
)
hf_subset = getattr(task, "hf_subset", "default")
relevant_docs_dict = getattr(task, "relevant_docs", {})
if relevant_docs_dict:
for hf_split, relevant_docs in relevant_docs_dict.items():
model.set_relevant_docs(
task_name,
hf_split,
hf_subset,
relevant_docs,
)

@e06084 e06084 merged commit df7e1eb into MigoXLab:dev Jun 8, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant