Goal
Benchmark local model candidates for smdu AI tasks and produce a shortlist.
Questions to answer
- Which model sizes are viable on common developer machines?
- What latency/memory profile is acceptable for interactive TUI usage?
- Which tasks require embeddings vs generation vs classification?
Candidate families
- Qwen2.5 instruct variants (0.5B, 1.5B, 3B)
- Qwen2.5 Coder instruct variants (for path/action reasoning)
- compact embedding models compatible with
transformers.js
Evaluation matrix
- cold start time
- warm inference latency
- memory footprint (idle and active)
- quality on task-specific prompts
- download size and first-run impact
Acceptance Criteria
- at least 3 viable candidates benchmarked end-to-end
- one recommended default model per task type (summary/query/tagging)
- explicit fallback strategy for low-memory environments
Goal
Benchmark local model candidates for
smduAI tasks and produce a shortlist.Questions to answer
Candidate families
transformers.jsEvaluation matrix
Acceptance Criteria