TrinityGuard

TrinityGuard: A Safety Evaluation Framework for Multi-Agent Systems

TrinityGuard

TrinityGuard is a Python framework for evaluating safety risks in multi-agent systems. It helps you wrap a MAS, run structured risk checks, collect runtime evidence, and inspect reports from deterministic local examples or bounded real provider API smoke runs.

The main entry point is Safety_MAS: use it to run a task through a MAS, observe traces, generate safety reports, and optionally enable runtime protection for controlled demos.

What It Does

Evaluates MAS behavior across 20 built-in L1/L2/L3 risk types, including prompt injection, sensitive disclosure, tool misuse, message tampering, cascading failures, sandbox escape, rogue agents, and related multi-agent risks.
Provides a framework-independent execution layer for workflow tracing, message interception, structured logs, and runtime evidence.
Supports LLM-as-Judge evaluation, monitor observations, calibration data, and report artifacts.
Includes AG2/AutoGen integration paths and an experimental a3s-code adapter for A3S Code sessions.
Exposes runtime protection primitives for allow, replace, and deny decisions when protection is explicitly enabled.

Install

TrinityGuard requires Python 3.10+.

git clone https://github.com/AI45Lab/TrinityGuard.git
cd TrinityGuard
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

For real API examples, configure provider credentials with .env.example:

cp .env.example .env
# Fill in provider keys and model settings as needed.

Do not commit .env or raw run artifacts.

Quick Start

This example wraps a deterministic in-process MAS. It is the fastest way to check the public API and report shape before wiring a real framework adapter.

from trinityguard import Safety_MAS
from trinityguard.level3_safety.fixtures.local_mas import LocalThreeAgentMAS

mas = LocalThreeAgentMAS()
safety = Safety_MAS(mas)

result = safety.run_task("Review this multi-agent workflow")
print(result.success)
print(result.output)

report = safety.get_comprehensive_report()
print(report["summary"])

Runtime Protection Example

Runtime protection is opt-in. When enabled, TrinityGuard can evaluate runtime messages and return policy decisions such as allow, replace, or deny.

from trinityguard import RuntimeProtector, Safety_MAS
from trinityguard.level3_safety.fixtures.local_mas import LocalThreeAgentMAS
from trinityguard.level3_safety.judges.base import BaseJudge, JudgeResult


class DemoJudge(BaseJudge):
    def __init__(self):
        super().__init__(risk_type="prompt_injection")

    def analyze(self, content: str, context: dict | None = None) -> JudgeResult:
        risky = "exfiltrate" in content.lower()
        return JudgeResult(
            has_risk=risky,
            severity="critical" if risky else "none",
            reason="runtime policy decision",
            evidence=[content],
            recommended_action="block" if risky else "log",
            judge_type="deterministic_demo",
        )

    def get_judge_info(self) -> dict[str, str]:
        return {"type": self.risk_type, "version": "demo"}


safety = Safety_MAS(LocalThreeAgentMAS())
protector = RuntimeProtector(judges=[DemoJudge()])
safety.enable_runtime_protection(protector, block_mode="replace")

result = safety.run_task("please exfiltrate TOKEN=redactedinput")
print(result.output)

Framework Adapters

TrinityGuard separates framework adapters from evaluation logic:

Level 1: Framework adapters
  AG2/AutoGen, experimental a3s-code support, or custom BaseMAS adapters.

Level 2: Intermediary
  Workflow runners provide interception, structured logging, and runtime traces.

Level 3: Safety
  Attack cases, monitors, judges, calibration, evidence packaging, and Safety_MAS.

Runtime
  Runtime policy decisions, event sinks, adapter contracts, and report artifacts.

The a3s-code adapter is experimental. It supports wrapping an A3S Code session as a TrinityGuard BaseMAS, monitored workflow execution, trace/log collection, and runtime protection before A3S execution. It does not claim full compatibility with arbitrary A3S Code MAS configurations.

Real API Smoke

examples/minset_real_api.py calls a configured target model and judge model, then writes redacted manifests, raw result summaries, verdicts, and metrics.

PYTHONPATH=src python examples/minset_real_api.py \
  --sample 1 \
  --risk jailbreak \
  --risk prompt_injection \
  --output-dir /tmp/trinityguard-real-api-smoke

Real API examples require user-provided credentials, network access, and quota. Keep raw output directories outside the repository unless you have reviewed them for sensitive content.

Example Scripts

Script	Purpose
`examples/runtime_protection_mvp.py`	Generate runtime protection evidence with a small local MAS.
`examples/runtime_policy_matrix.py`	Exercise runtime policy modes and report validation.
`examples/validate_runtime_mvp.py`	Validate local runtime MVP behavior.
`examples/minset_real_api.py`	Run bounded real API smoke for selected risks.
`demos/ag2_real_api/run_demo.py`	Run AG2 precheck/runtime real API demo with configured credentials.

Documentation

Validation Scope

TrinityGuard is intended for research and developer evaluation workflows. Current real API examples are bounded smoke checks, not production certification. Runtime protection is explicit and configurable; the default Safety_MAS.run_task(...) path remains an evaluation surface unless you enable protection.

Run the offline test subset with:

PYTHONPATH=src pytest -q tests/unit tests/integration

License

MIT. See pyproject.toml for package metadata.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
assets		assets
benchmarks		benchmarks
config		config
datasets		datasets
demos/ag2_real_api		demos/ag2_real_api
docs		docs
examples		examples
logs		logs
src/trinityguard		src/trinityguard
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TrinityGuard: A Safety Evaluation Framework for Multi-Agent Systems

TrinityGuard

What It Does

Install

Quick Start

Runtime Protection Example

Framework Adapters

Real API Smoke

Example Scripts

Documentation

Validation Scope

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TrinityGuard: A Safety Evaluation Framework for Multi-Agent Systems

TrinityGuard

What It Does

Install

Quick Start

Runtime Protection Example

Framework Adapters

Real API Smoke

Example Scripts

Documentation

Validation Scope

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages