feat: add semantic_scholar client by e06084 · Pull Request #420 · MigoXLab/dingo

e06084 · 2026-06-06T12:43:46Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces a new Semantic Scholar backend (SemanticScholarClient) for retrieval evaluation and updates the retrieval execution logic to store and expose the execution summary. The feedback highlights several critical improvements: in retrieval.py, initializing self.summary with SummaryModel() could fail if there are required fields, so initializing it to None is recommended. In the new Semantic Scholar backend, holding a lock during time.sleep() in _rate_limit_wait blocks other threads and should be avoided by sleeping outside the lock. Additionally, the backend should defensively validate that the API response is a dictionary and that papers is a list, and normalize the paper scores using the number of returned papers rather than the total database matches to ensure consistent scoring.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-06T12:45:10Z

+        self.summary = SummaryModel()
+
+    def get_summary(self):
+        return self.summary


Initializing self.summary to SummaryModel() without arguments is highly likely to raise a TypeError or ValidationError at runtime if SummaryModel has required fields (such as task_id, task_name, etc.) and no default values. It is safer to initialize self.summary to None and type it as SummaryModel | None.

Suggested change

self.summary = SummaryModel()

def get_summary(self):

return self.summary

self.summary: SummaryModel | None = None

def get_summary(self) -> SummaryModel | None:

return self.summary

gemini-code-assist · 2026-06-06T12:45:10Z

+            data = resp.json()
+            papers = data.get("data") or []
+            total = data.get("total", len(papers))


If the API returns a valid JSON response that is not a dictionary (e.g., a list or a string), calling data.get() will raise an AttributeError. Additionally, if data.get("data") is not a list, iterating over papers or calling len(papers) could raise a TypeError. To ensure robust error handling and defensive programming, we should explicitly verify that data is a dictionary and papers is a list before processing them.

data = resp.json() if not isinstance(data, dict): return SearchResponse( query=query, results=[], response_time_ms=elapsed_ms, status_code=resp.status_code, error="Invalid JSON response format: expected a dictionary", ) papers = data.get("data") if not isinstance(papers, list): papers = [] total = data.get("total", len(papers))

gemini-code-assist · 2026-06-06T12:45:10Z

+                    a.get("name", "") for a in authors_raw if isinstance(a, dict)
+                ]
+                year = paper.get("year")
+                score = 1.0 - (i / max(total, 1))


Using the total number of search results in the entire database (total) to normalize the rank-based score makes the scores highly inconsistent and dependent on the query's popularity. For queries with a very large number of matches, the scores of all top retrieved papers will be extremely close to 1.0, compressing the score range and losing relative ranking distinction. It is much more consistent to normalize the score based on the number of returned papers (len(papers)).

Suggested change

score = 1.0 - (i / max(total, 1))

score = 1.0 - (i / len(papers)) if papers else 0.0

e06084 added 2 commits June 6, 2026 19:42

feat: retrieval add summary

5f9c0ce

feat: add semantic_scholar client

af613dc

gemini-code-assist Bot reviewed Jun 6, 2026

View reviewed changes

x

b47aa3a

e06084 merged commit 4c7cdd7 into MigoXLab:dev Jun 6, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add semantic_scholar client#420

feat: add semantic_scholar client#420
e06084 merged 3 commits into
MigoXLab:devfrom
e06084:dev

e06084 commented Jun 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	score = 1.0 - (i / max(total, 1))
	score = 1.0 - (i / len(papers)) if papers else 0.0

Conversation

e06084 commented Jun 6, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant