data(estimate): add deepseek-v4-flash benchmark results by k08200 · Pull Request #1294 · wrtnlabs/autobe

k08200 · 2026-04-27T03:52:07Z

Summary

Adds deepseek-v4-flash evaluation reports for the four standard benchmark projects (todo, reddit, shopping, erp) and updates batch-summary.json so the dev surface (autobe.dev) picks up the new model.

Results

project	grade	score	gate	doc	req	test	logic	api
todo	B	87	PASS	91	100	79	100	100
reddit	C	78	PASS	87	100	74	90	100
shopping	C	73	PASS	84	97	74	92	100
erp	B	80	PASS	91	100	85	92	100

Average: 79.5 (B-tier, comparable to claude-sonnet-4.6 / qwen3.5-397b-a17b at 80.0).

Changes

packages/estimate/reports/benchmark/deepseek-v4-flash/{erp,reddit,shopping,todo}/estimate-report.json — new reports
packages/estimate/reports/benchmark/batch-summary.json — 4 new entries inserted after deepseek-v3.2

Adds deepseek-v4-flash benchmark reports for the four standard projects and updates batch-summary.json so dev picks up the new model. | project | grade | score | |----------|-------|-------| | todo | B | 87 | | reddit | C | 78 | | shopping | C | 73 | | erp | B | 80 |

…v4-flash - Add deepseek-v4-flash to MODEL_TO_RESULTS_PATH so the aggregator resolves its autobe-examples directory and enriches pipeline detail. - Regenerate apps/dashboard-ui/public/benchmark-summary.json and website/public/benchmark/benchmark-summary.json so localhost:3000/benchmark and the deployed dashboard surface the new model.

k08200 added 2 commits April 27, 2026 12:51

k08200 merged commit 20ad514 into main Apr 27, 2026
2 checks passed

k08200 deleted the benchmark/deepseek-v4-flash branch April 27, 2026 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data(estimate): add deepseek-v4-flash benchmark results#1294

data(estimate): add deepseek-v4-flash benchmark results#1294
k08200 merged 2 commits into
mainfrom
benchmark/deepseek-v4-flash

k08200 commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

k08200 commented Apr 27, 2026

Summary

Results

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant