deepeval-metrics

Here are 4 public repositories matching this topic...

sunilp303 / deepeval-evaluation-harness

Pluggable DeepEval scaffold for RAG, agents, and LLM apps across Anthropic, Bedrock, Azure OpenAI, and Vertex. Ships traceability, test synthesis, safety/PII gating, multi-turn conversation eval, agentic tool-use scoring, JSON validation, judge benchmarks, hyperparameter sweeps, and pytest CI — one Makefile target per feature.

bedrock evaluation-metrics vertex-ai azure-openai anthropic ollama rag-evaluation ragas deepeval harness-ai deepeval-metrics

Updated Jun 3, 2026
Python

sunilp303 / trulens-agent-starter

Star

Drop-in TruLens evaluation harness for tool-calling LangGraph agents. Swap LLM providers (OpenAI, Anthropic via LiteLLM, Bedrock, Cortex, Gemini, Ollama) with a single env var. Ships with the RAG Triad plus Plan Quality, Plan Adherence, Execution Efficiency, and Logical Consistency metrics.

bedrock evaluation-metrics vertex-ai azure-openai anthropic ollama rag-evaluation trulens ragas deepeval harness-ai deepeval-metrics

Updated Jun 3, 2026
Python

Naveen-Ravichandran003 / llm-evaluation-deepeval

Star

A production-ready LLM evaluation framework built with DeepEval and a custom Groq-powered judge (LLaMA 3.3 70B). Evaluates LLM outputs across 5 key metrics — Answer Relevancy, Faithfulness, Hallucination, Toxicity & G-Eval — without requiring an OpenAI API key. Fast, free-tier compatible.

llm llm-evaluation-framework llm-testing deepeval llm-judge groq-llama3 deepeval-metrics

Updated May 9, 2026
Python

nipunkhanderia / rag-eval-project

Star

RAG pipeline with LLM evaluation using DeepEval, Ollama/Groq judge models, and Langfuse observability.

ai llama metrics-gathering rag groq ollama langfuse deepeval deepeval-metrics

Updated Jun 10, 2026
Python

Improve this page

Add a description, image, and links to the deepeval-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deepeval-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepeval-metrics

Here are 4 public repositories matching this topic...

sunilp303 / deepeval-evaluation-harness

sunilp303 / trulens-agent-starter

Naveen-Ravichandran003 / llm-evaluation-deepeval

nipunkhanderia / rag-eval-project

Improve this page

Add this topic to your repo