You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adversarial Testing Lab for Agentic Safeguards (ATLAS). A synthetic multi-agent eval environment for adversarial fraud decisioning inspired by Anthropic's Project Deal. Measures how model quality, …
An eval and observability cockpit for coding agents. It runs policy-controlled coding agents in sandboxed toy repos, tool-use traces, MCP tools, compares harness policies, scores recovery and safet…
Do agents make for good offensive & defensive coordinators in football? This is an adversarial-agent arena for short red-zone strategy contests. OC & DC agents compete through simultaneous legal pl…
Can you eval an art form? Canon is a continuity linter for serialized TV, YouTube and micro-drama fiction. Canon plays the role of whats currently the scriptwriting coordinator, verifies your story…
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page
or contact support.