Skip to content
#

deepeval-metrics

Here are 4 public repositories matching this topic...

Pluggable DeepEval scaffold for RAG, agents, and LLM apps across Anthropic, Bedrock, Azure OpenAI, and Vertex. Ships traceability, test synthesis, safety/PII gating, multi-turn conversation eval, agentic tool-use scoring, JSON validation, judge benchmarks, hyperparameter sweeps, and pytest CI — one Makefile target per feature.

  • Updated Jun 3, 2026
  • Python

Drop-in TruLens evaluation harness for tool-calling LangGraph agents. Swap LLM providers (OpenAI, Anthropic via LiteLLM, Bedrock, Cortex, Gemini, Ollama) with a single env var. Ships with the RAG Triad plus Plan Quality, Plan Adherence, Execution Efficiency, and Logical Consistency metrics.

  • Updated Jun 3, 2026
  • Python

Improve this page

Add a description, image, and links to the deepeval-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deepeval-metrics topic, visit your repo's landing page and select "manage topics."

Learn more