feat: --hybrid flag routes file-analyzer batches to local Gemma via Ollama (~4-5× cost reduction) by Itsthewayofyou · Pull Request #176 · Egonex-AI/Understand-Anything

Itsthewayofyou · 2026-05-22T16:32:53Z

Summary

Adds --hybrid flag to /understand skill that routes the expensive file-analyzer extraction phase to a local Gemma model via Ollama instead of Claude subagents
Introduces hybrid_runner.py — a self-contained Python orchestrator that handles tree-sitter extraction + Gemma semantic analysis + schema-safe post-processing
Reduces Claude API cost by ~4-5× on medium/large repos. Architecture-reasoning phases (architecture-analyzer, tour-builder, domain-analyzer) stay on Claude.

How it works

# Standard (unchanged)
/understand

# Hybrid — file-analyzer batches go to local Gemma
/understand --hybrid

# Configure the model/endpoint
OLLAMA_MODEL=gemma4:26b-a4b OLLAMA_HOST=http://localhost:11434 /understand --hybrid

Routing table:

Phase	Standard	Hybrid
project-scanner	Claude	Claude (low cost, dynamic script)
file-analyzer ×N batches	Claude	Gemma via Ollama
assemble-reviewer	Claude	Claude (semantic judgment)
architecture-analyzer	Claude	Claude
tour-builder	Claude	Claude
domain-analyzer	Claude	Claude

Key design decisions

Why only file-analyzer? It's ~80% of total token cost — runs in 5 concurrent batches of 20-30 files each. The other agents are either low-cost or require Claude-level reasoning quality.

Import edge injection: Gemma tends to re-resolve imports from source instead of strictly using batchImportData. hybrid_runner.py strips all Gemma-emitted imports edges and reinserts them deterministically from batchImportData (pre-resolved by project-scanner). This makes import graphs identical to standard mode regardless of model quality.

Graceful fallback: If Gemma fails (timeout, parse error), the runner writes an empty batch-N.json. The pipeline continues; assemble-reviewer notes the gap and merge-batch-graphs.py handles it.

Safe for existing users: HYBRID_MODE=false by default. No behavior change unless --hybrid is explicitly passed.

Requirements

Ollama running locally with a capable model (tested: gemma4:26b-a4b, any model ≥13B should work)
Python 3 (stdlib only — no new dependencies)
The extract-structure.mjs script already bundled with the skill runs unchanged

Tested on

gemma4:26b-a4b via Ollama
Calibration: 100% node ID prefix correctness, 100% edge type vocabulary, JSON validity ✅
Output flows correctly through merge-batch-graphs.py normalization

Test plan

python3 hybrid_runner.py --help shows scan/analyze subcommands
Run /understand --hybrid on a small repo (20-50 files) — verify batch-*.json written
Compare assembled graph node/edge counts between standard and hybrid modes
Verify OLLAMA_MODEL env var respected
Verify Ollama-down fallback logs warning and continues

🤖 Generated with Claude Code

… via Ollama Adds hybrid_runner.py and a new --hybrid flag to the /understand skill, routing the expensive file-analyzer extraction phase to a local Gemma model (default: gemma4:26b-a4b) via Ollama instead of Claude subagents. Reduces Claude API cost by ~4-5× on medium/large repos by keeping architectural reasoning on Claude and offloading structural extraction to a local model. Changes: - skills/understand/hybrid_runner.py (new): standalone Python script that runs extract-structure.mjs (unchanged), sends batch data to Ollama, and applies post-processing (import edge injection from batchImportData, weight normalization, deduplication). Graceful fallback on Gemma failure. - skills/understand/SKILL.md: adds --hybrid flag docs, Phase 0 Ollama reachability check (HYBRID_MODE), Phase 2 hybrid dispatch path (5 parallel background python processes writing batch-N.json) Design notes: - project-scanner and assemble-reviewer stay on Claude (low cost / need judgment) - Only imports edges are stripped and reinjected from batchImportData to prevent Gemma hallucinating import relationships (confirmed via calibration probe) - Configure via OLLAMA_HOST, OLLAMA_MODEL, HYBRID_TIMEOUT env vars - Tested on gemma4:26b-a4b; compatible with any Ollama-hosted model Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Lum1104 · 2026-05-23T00:09:32Z

Sounds good, can this run faster or it just reduce the token usage in your use case?

@codex review this

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 97a75401f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T00:13:43Z

+    except urllib.error.HTTPError as e:
+        body = e.read().decode()[:500]
+        raise RuntimeError(f"Ollama HTTP {e.code}: {body}") from e
+    except TimeoutError:
+        raise RuntimeError(f"Ollama request timed out after {timeout}s") from None


Catch URLError and preserve fallback behavior

call_ollama only converts HTTPError/TimeoutError to RuntimeError, but urllib.request.urlopen raises URLError for common failures like connection refused, DNS failures, and many timeout cases. In those cases run_analyze's except RuntimeError block is skipped, so the process exits with an uncaught exception instead of writing the documented empty batch-<N>.json fallback, which can break hybrid runs when Ollama becomes unavailable mid-analysis.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-23T00:13:43Z

+        n_type = node.get("type", "file")
+        if n_type not in VALID_NODE_TYPES:
+            node["type"] = "file"  # fallback


Populate missing node type before emitting batch output

The normalization path only fixes type when it is invalid, not when it is missing: node.get("type", "file") makes the check pass while leaving node["type"] unset. If Gemma omits type for a node, that node is emitted without a required schema field and will be dropped later during graph validation, causing avoidable node loss in hybrid mode.

Useful? React with 👍 / 👎.

simkimsia · 2026-05-26T03:23:36Z

if i increase the concurrent batches from 5 to 10, will taht speed things up?

Why only file-analyzer? It's ~80% of total token cost — runs in 5 concurrent batches of 20-30 files each. The other agents are either low-cost or require Claude-level reasoning quality.

…ng node type) P1: call_ollama only caught HTTPError/TimeoutError, so urllib.error.URLError (connection refused when Ollama is down, DNS failures, socket-level timeouts) escaped run_analyze's `except RuntimeError` and crashed the hybrid run instead of writing the documented empty batch-N.json fallback. Catch URLError and wrap it as RuntimeError so graceful fallback works as advertised. P2: node `type` normalization used `node.get("type", "file")` to test validity but only wrote back on an invalid value — a node with a *missing* type key passed the check while node["type"] stayed unset, causing it to be dropped at schema validation (silent node loss). Write the resolved type back unconditionally. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Itsthewayofyou · 2026-06-24T21:06:06Z

@Lum1104 Good question — it's primarily a cost play, not a speed one. The ~4-5× is token/API cost, not wall-clock. On most setups it's actually a touch slower: the 5 file-analyzer batches all hit a single local Ollama instance, so a large model on one GPU serializes generation, whereas standard mode runs those batches with Claude's cloud parallelism. So: meaningfully cheaper, comparable-to-slightly-slower on speed depending on your local hardware. If latency (not cost) is the goal, standard mode is still the faster path.

Also worth noting the fix I just pushed (a84b942): a connection-refused/DNS failure now correctly falls back to an empty batch instead of crashing the run, and a node emitted without a type no longer gets silently dropped at validation.

Itsthewayofyou · 2026-06-24T21:06:07Z

@simkimsia In hybrid mode, no — and it can make things slightly worse. All batches queue against a single Ollama endpoint, so throughput is bound by the GPU and OLLAMA_NUM_PARALLEL, not by how many batches you fan out. Sending 10 instead of 5 just deepens the queue and adds memory/context pressure on a large model. The levers that actually move hybrid speed are the local model size/quant, OLLAMA_NUM_PARALLEL, and your GPU.

In standard (Claude) mode, raising concurrency does help — up to your API rate limits — since each batch is an independent cloud request.

Resolve SKILL.md conflicts in Phase 1 / Phase 2 sections: - Keep both upstream's `[Phase 1/7] Scanning...` progress line and the PR's hybrid-mode Phase 1 note. - Keep the PR's HYBRID_MODE batch-dispatch block; adopt upstream's reworded file-analyzer dispatch line and prompt-template intro (now references batches.json[i]). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

berryliu85-star mentioned this pull request May 25, 2026

How to make /understand run faster ? #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: --hybrid flag routes file-analyzer batches to local Gemma via Ollama (~4-5× cost reduction)#176

feat: --hybrid flag routes file-analyzer batches to local Gemma via Ollama (~4-5× cost reduction)#176
Itsthewayofyou wants to merge 3 commits into
Egonex-AI:mainfrom
Itsthewayofyou:feat/hybrid-gemma-routing

Itsthewayofyou commented May 22, 2026

Uh oh!

Lum1104 commented May 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Uh oh!

simkimsia commented May 26, 2026

Uh oh!

Itsthewayofyou commented Jun 24, 2026

Uh oh!

Itsthewayofyou commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

Itsthewayofyou commented May 22, 2026

Summary

How it works

Key design decisions

Requirements

Tested on

Test plan

Uh oh!

Lum1104 commented May 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

simkimsia commented May 26, 2026

Uh oh!

Itsthewayofyou commented Jun 24, 2026

Uh oh!

Itsthewayofyou commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants