ai-eval

Star

Here are 6 public repositories matching this topic...

kasimmj / claude-code-test-runner

Star

🧪 Evaluation framework for testing Claude Code skills at scale. Run regression suites across model versions.

testing pytest regression-testing claude ai-testing anthropic evals llm-evaluation claude-code ai-eval

Updated May 22, 2026

ianfh0 / deduce

Star

daily puzzle for ai agents

nextjs ai-agents anthropic daily-puzzle ai-eval

Updated Apr 15, 2026
TypeScript

KarmaEnchanter / mental-health-llm-eval

Star

Open evaluation harness for mental health LLM responses. 5 clinically-grounded rubrics, LLM-as-judge with bias controls, crisis-detection routing to 988 protocols.

psychology cbt ai-safety conversational-ai clinical-ai cohen-kappa ollama llm-evaluation llm-as-judge mental-health-ai ai-eval inter-rater-reliability eval-harness lifeline-988 open-source-eval

Updated May 29, 2026
Python

Gcy-02 / ai-chat-coach

Star

AI 聊天教练 MVP：Spring Boot、DeepSeek、结构化输出、两段式分析和轻量评测体系。

java spring-boot chinese ai-chatbot prompt-engineering deepseek ai-eval

Updated Jun 5, 2026
Java

klausners / prompt-optimizer

Star

Config-driven CLI that runs promptfoo evals, identifies low-scoring prompts, rewrites them via Claude API, and re-evaluates.

cli automation claude llm prompt-engineering llm-eval prompt-optimization promptfoo ai-eval

Updated Mar 26, 2026
TypeScript

sejin8905 / ai-rag-citation-check-kit-lite

Star

Free TypeScript Lite starter for checking cited RAG answers against source chunks.

typescript ai citations developer-tools rag llm ai-eval

Updated May 26, 2026
TypeScript

Improve this page

Add a description, image, and links to the ai-eval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-eval topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-eval

Here are 6 public repositories matching this topic...

kasimmj / claude-code-test-runner

ianfh0 / deduce

KarmaEnchanter / mental-health-llm-eval

Gcy-02 / ai-chat-coach

klausners / prompt-optimizer

sejin8905 / ai-rag-citation-check-kit-lite

Improve this page

Add this topic to your repo