Skip to content

feat: Add GovernanceDecision and GovernanceOutcome contract types#6030

Open
nagasatish007 wants to merge 21 commits into
crewAIInc:mainfrom
nagasatish007:feat/governance-decision-contract
Open

feat: Add GovernanceDecision and GovernanceOutcome contract types#6030
nagasatish007 wants to merge 21 commits into
crewAIInc:mainfrom
nagasatish007:feat/governance-decision-contract

Conversation

@nagasatish007

@nagasatish007 nagasatish007 commented Jun 3, 2026

Copy link
Copy Markdown

Summary

Adds a vendor-neutral GovernanceDecision TypedDict contract that crew-level governance hooks (before_tool_call / after_tool_call from #5890) can optionally return. Also adds GovernanceOutcome as the linked post-execution record.

This gives the hook one return shape that works for blocking, approval, resume, and audit — without coupling CrewAI to any vendor's evidence format.

Closes #5888 (governance decision contract portion)

Design Principles

  • CrewAI owns the envelope — the type lives in CrewAI core, no vendor imports
  • Vendor-neutral — TealTiger, Neura Relay, Vaara, or any engine can implement this contract
  • Extensions are pass-throughextensions: dict[str, Any] is never validated by CrewAI. Vendors attach their evidence (signed receipts, Merkle proofs, etc.) under their own namespace key
  • Two-record boundaryGovernanceDecision is pre-execution, GovernanceOutcome is post-execution, linked via decision_id
  • Serialized contract first — the TypedDict defines the JSON wire format; implementations can wrap it in a dataclass if preferred

Files

File Purpose
lib/crewai/src/crewai/governance/governance_decision.py GovernanceDecision + GovernanceOutcome TypedDicts
lib/crewai/tests/governance/test_governance_decision_contract.py Contract-test fixtures for all 4 decision routes + extension round-trip

Contract Fields (GovernanceDecision)

class GovernanceDecision(TypedDict, total=False):
    decision_id: str
    agent_id: str
    agent_role: str
    tool: str
    request_id: str
    params_hash: str        # SHA-256 of JCS-canonicalized params
    target: str
    policy_refs: list[str]
    decision: Literal["allow", "deny", "require_approval", "revise"]
    reason: str
    issued_at: str          # ISO 8601
    expires_at: str | None
    supersedes: str | None
    revalidate_if: list[str]
    evidence_refs: list[str]
    extensions: dict[str, Any]  # vendor pass-through

Test Fixtures

Fixture Decision Tests
ALLOW allow Minimum valid decision
DENY deny Policy ref present
REQUIRE_APPROVAL require_approval Has expires_at
REVISE revise Has revalidate_if
ALLOW + extension allow TEEC evidence under extensions["teec"]
Unknown extension allow Arbitrary vendor payload round-trips unchanged
Outcome executed Links back via decision_id

Extension Round-Trip Test

The key test: an unknown vendor extension (extensions["custom_vendor"]) with nested dicts, arrays, and unicode passes through JSON serialize → deserialize without any data loss or validation failure. This proves CrewAI doesn't filter, validate, or modify vendor evidence.

Backward Compatibility

Co-authored with

Co-authored-by: rpelevin rpelevin@users.noreply.github.com

References

Summary by CodeRabbit

  • New Features

    • Added governance decision and outcome contract types for authorization metadata, including verdicts, expiry/revalidation hints, evidence references, and vendor-specific extensions
    • Exposed these governance types at the governance package level for easier import
  • Tests

    • Added tests validating JSON round-trip serialization, extension payload passthrough, required fields, expiry/revalidate semantics, error outcomes, and decision→outcome linkage

This module defines the GovernanceDecision and GovernanceOutcome TypedDicts for vendor-neutral governance hooks in CrewAI. It specifies the structure and fields for pre-execution and post-execution records used in governance processes.
This file contains contract tests for GovernanceDecision and GovernanceOutcome, validating decision routes, JSON serialization, and outcome references.
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
lib/crewai/src/crewai/governance/__init__.py (1)

1-2: ⚡ Quick win

Export types from __init__.py for better ergonomics.

The governance module's __init__.py is empty, requiring users to import from the full submodule path. Exporting the public types would make imports more convenient and discoverable.

📦 Proposed export pattern
+"""
+CrewAI governance contracts.
+"""
+
+from crewai.governance.governance_decision import (
+    GovernanceDecision,
+    GovernanceOutcome,
+)
+
+__all__ = ["GovernanceDecision", "GovernanceOutcome"]

This allows cleaner imports:

from crewai.governance import GovernanceDecision, GovernanceOutcome

instead of:

from crewai.governance.governance_decision import GovernanceDecision, GovernanceOutcome
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/governance/__init__.py` around lines 1 - 2, Add public
exports to the governance package by importing and re-exporting the key types in
the package __init__.py: import GovernanceDecision and GovernanceOutcome (and
any other public classes) from their defining module (e.g., governance_decision)
and add them to the package's __all__ so users can do "from crewai.governance
import GovernanceDecision, GovernanceOutcome"; ensure you reference the exact
class names GovernanceDecision and GovernanceOutcome and any other public
symbols you want exposed.
test_governance_decision_contract.py (1)

98-103: ⚡ Quick win

Consider adding error outcome fixture for completeness.

The current outcome fixture only tests outcome="executed". Since GovernanceOutcome defines error_type and error_message fields and includes "error" as a valid outcome, adding a fixture that demonstrates the error case would strengthen contract coverage.

🧪 Proposed error outcome fixture
FIXTURE_OUTCOME_ERROR: GovernanceOutcome = {
    "decision_id": "d-002",  # Link to FIXTURE_DENY or any decision
    "outcome": "error",
    "error_type": "PermissionError",
    "error_message": "Insufficient permissions to execute tool",
    "completed_at": "2026-06-03T14:01:05Z",
}

Then add to test_all_fixtures_json_serializable:

     fixtures: list[dict[str, Any]] = [
         FIXTURE_ALLOW,
         FIXTURE_DENY,
         FIXTURE_REQUIRE_APPROVAL,
         FIXTURE_ALLOW_WITH_EXTENSION,
         FIXTURE_REVISE,
         FIXTURE_OUTCOME,
+        FIXTURE_OUTCOME_ERROR,
         FIXTURE_UNKNOWN_EXTENSION,
     ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test_governance_decision_contract.py` around lines 98 - 103, Add an
error-case fixture for GovernanceOutcome named FIXTURE_OUTCOME_ERROR that sets
"outcome" to "error" and includes "error_type" and "error_message" plus
"decision_id" and "completed_at" to mirror FIXTURE_OUTCOME structure; then
include FIXTURE_OUTCOME_ERROR in the test_all_fixtures_json_serializable
assertions so the error-path (outcome="error") is covered by serialization tests
(reference symbols: FIXTURE_OUTCOME, FIXTURE_OUTCOME_ERROR, GovernanceOutcome,
test_all_fixtures_json_serializable).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/crewai/src/crewai/governance/__init__.py`:
- Around line 1-2: Add public exports to the governance package by importing and
re-exporting the key types in the package __init__.py: import GovernanceDecision
and GovernanceOutcome (and any other public classes) from their defining module
(e.g., governance_decision) and add them to the package's __all__ so users can
do "from crewai.governance import GovernanceDecision, GovernanceOutcome"; ensure
you reference the exact class names GovernanceDecision and GovernanceOutcome and
any other public symbols you want exposed.

In `@test_governance_decision_contract.py`:
- Around line 98-103: Add an error-case fixture for GovernanceOutcome named
FIXTURE_OUTCOME_ERROR that sets "outcome" to "error" and includes "error_type"
and "error_message" plus "decision_id" and "completed_at" to mirror
FIXTURE_OUTCOME structure; then include FIXTURE_OUTCOME_ERROR in the
test_all_fixtures_json_serializable assertions so the error-path
(outcome="error") is covered by serialization tests (reference symbols:
FIXTURE_OUTCOME, FIXTURE_OUTCOME_ERROR, GovernanceOutcome,
test_all_fixtures_json_serializable).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 00f632c3-6eb2-4dd0-89e0-6bb603ca4224

📥 Commits

Reviewing files that changed from the base of the PR and between 68cdd44 and 6ff28b2.

📒 Files selected for processing (3)
  • governance_decision.py
  • lib/crewai/src/crewai/governance/__init__.py
  • test_governance_decision_contract.py

@rpelevin

rpelevin commented Jun 3, 2026

Copy link
Copy Markdown

Thanks, Naga. Directionally this is the right contract shape.

One mechanical thing I would fix before merge: governance_decision.py appears to be added at repo root, while the test imports crewai.governance.governance_decision. I think the contract should live at lib/crewai/src/crewai/governance/governance_decision.py, with GovernanceDecision and GovernanceOutcome exported from lib/crewai/src/crewai/governance/__init__.py.

I would also move the contract test into CrewAI's test tree, e.g. lib/crewai/tests/governance/test_governance_decision_contract.py, so it runs with the package layout.

Field names and boundary look right to me: decision_id, params_hash, policy_refs, revalidate_if, evidence_refs, extensions, and separate GovernanceOutcome linked by decision_id. The unknown-extension round-trip test is exactly the right vendor-neutral proof.

Optional but useful: add one outcome="error" fixture so error_type / error_message are covered too.

This file contains contract tests for GovernanceDecision and GovernanceOutcome, validating decision routes, JSON serialization, and outcome references.
This module defines the GovernanceDecision and GovernanceOutcome types for vendor-neutral governance hooks in CrewAI, including their fields and documentation.
Added test for error outcomes to validate error_type and error_message fields.
@nagasatish007

Copy link
Copy Markdown
Author

@rpelevin the changes have been done.

@rpelevin

rpelevin commented Jun 4, 2026

Copy link
Copy Markdown

Thanks, Naga. I went through the latest commits. This addresses the core things I called out: the contract now lives under crewai.governance, the types are exported from the package, GovernanceDecision and GovernanceOutcome stay separate with decision_id linking them, and the extension round-trip plus error-outcome coverage are now in the tests.

From my side the field names and vendor-neutral boundary look right. I would avoid adding more scope here and let the CrewAI maintainers decide the final test layout and merge mechanics. Great work getting this tightened up.

nagasatish007 and others added 5 commits June 13, 2026 19:47
Adds monotonic seq and running_count fields to GovernanceDecision and
GovernanceOutcome as core (non-extension) fields for completeness evidence.

A verifier holding N records can prove no records were dropped:
- seq must form contiguous 1..N (gap = provable omission)
- max(running_count) must equal len(records)

Includes verify_contiguity() utility and 4 contract tests for the gap case.

Ref: vaaraio/vaara#283 (working implementation)
Co-authored-by: rpelevin
Add seq and running_count fields to governance decision fixtures and tests for omission detection.
@vaaraio

vaaraio commented Jun 21, 2026

Copy link
Copy Markdown

@nagasatish007 here are the contract-test fixtures you invited, pinned to the base the shipped reference uses and extended to the tail-drop case the contract cannot express yet.

On the index base. The fixtures and the verifier have to agree, or seq contiguity breaks silently on day one. The working implementation you cited (vaaraio/vaara#283) is 0-indexed: the first decision in a run is seq: 0, running_count: 1, and the verifier expects a contiguous 0..N-1 with running_count == seq + 1 for every record. The contract draft is 1-indexed (1..N). max(running_count) == len(records) holds either way, but if the contract pins 1-indexed and a 0-indexed stream from the reference impl reaches the verifier, seq 0 reads as a missing record. The published spec pins 0-indexed (vaara.receipt/v1 SPEC.md 5.3, draft-sirkkavaara-vaara-receipt-00), so the cheapest way to make the contract and that reference impl agree is to pin the base the impl already uses.

On the sealing record. The per-record running_count cannot see a pure tail truncation. Drop the last decision and it takes its own seq and running_count with it, so the held set reads as complete. A terminal sealing record closes that: it pins the run's final count independently, so the holder expects max(seq + 1, running_count, total) records and a dropped tail shows as the missing range up to total. It is additive, so a run that is never finalized verifies exactly as before. The honest residual, pinned in the fixtures below: a suffix drop that also suppresses the seal stays invisible from the held set alone. No field closes that one. An external rfc3161 anchor over the run is what does.

Fixtures and a sealing-aware verifier. Every assertion here also passes against the shipped vaara.credential.verify_contiguity, so the contract and the reference impl give the same verdict on each case.

from collections import Counter


def verify_contiguity(records: list[dict]) -> bool:
    """0-indexed, sealing-aware. expected = max(max_seq + 1, max_running, sealed_total)."""
    seq_records = [r for r in records if not r.get("sealed")]
    sealed_total = max((int(r["total"]) for r in records if r.get("sealed")), default=0)
    if not seq_records:
        return sealed_total == 0  # a seal over zero held records is a fully dropped run
    seqs = [int(r["seq"]) for r in seq_records]
    counts = [int(r["running_count"]) for r in seq_records]
    expected = max(max(seqs) + 1, max(counts), sealed_total)
    duplicates = [s for s, n in Counter(seqs).items() if n > 1]
    missing = sorted(set(range(expected)) - set(seqs))
    count_mismatch = [r for r in seq_records if int(r["running_count"]) != int(r["seq"]) + 1]
    return not missing and not duplicates and not count_mismatch and len(seq_records) == expected


# 0-indexed: the first decision is seq 0, running_count 1.
FIXTURE_CONTIGUOUS_RUN = [
    {"decision_id": "d-101", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
    {"decision_id": "d-102", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 2},
    {"decision_id": "d-103", "tool": "write", "decision": "deny", "seq": 2, "running_count": 3},
]

FIXTURE_SEQ_GAP = [
    {"decision_id": "d-201", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
    {"decision_id": "d-202", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 2},
    # seq 2 missing: provable interior gap
    {"decision_id": "d-204", "tool": "deploy", "decision": "allow", "seq": 3, "running_count": 4},
]

FIXTURE_RUNNING_COUNT_MISMATCH = [
    {"decision_id": "d-301", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
    # running_count 4 at seq 1 says a later record existed; only 2 held
    {"decision_id": "d-302", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 4},
]

# Tail drop CAUGHT by the seal: held 0..2, seal pins total 4, the 4th decision is gone.
FIXTURE_TAIL_DROP_SEALED = [
    {"decision_id": "d-401", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
    {"decision_id": "d-402", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 2},
    {"decision_id": "d-403", "tool": "write", "decision": "allow", "seq": 2, "running_count": 3},
    {"boundary_id": "crew-run-1", "sealed": True, "total": 4},
]

# The same tail drop WITHOUT the seal reads as complete: the irreducible residual.
FIXTURE_TAIL_DROP_NO_SEAL = FIXTURE_TAIL_DROP_SEALED[:3]

# A finalized run with no drop passes.
FIXTURE_SEALED_WHOLE = FIXTURE_CONTIGUOUS_RUN + [{"boundary_id": "crew-run-1", "sealed": True, "total": 3}]


def test_contiguous_run_passes():
    assert verify_contiguity(FIXTURE_CONTIGUOUS_RUN) is True

def test_interior_gap_fails():
    assert verify_contiguity(FIXTURE_SEQ_GAP) is False

def test_running_count_omission_fails():
    assert verify_contiguity(FIXTURE_RUNNING_COUNT_MISMATCH) is False

def test_tail_drop_caught_by_seal():
    assert verify_contiguity(FIXTURE_TAIL_DROP_SEALED) is False

def test_tail_drop_without_seal_is_the_residual():
    assert verify_contiguity(FIXTURE_TAIL_DROP_NO_SEAL) is True

def test_sealed_whole_run_passes():
    assert verify_contiguity(FIXTURE_SEALED_WHOLE) is True

I can open this as a PR against your branch instead, if you would rather review it as a diff. Tag me either way.

@nagasatish007

Copy link
Copy Markdown
Author

@vaaraio — these fixtures are exactly what the PR needs. The sealing-aware verifier and the tail-drop test pair (test_tail_drop_caught_by_seal + test_tail_drop_without_seal_is_the_residual) document the honest residual cleanly.

On the index base: You've convinced me. Switching the contract to 0-indexed. Your argument is right: the reference impl is 0-indexed, the published spec pins 0-indexed, and asking every 0-indexed emitter to add +1 on the wire is friction that produces silent interop bugs. I'll update the PR to use seq: 0..N-1 with running_count == seq + 1. The verifier invariant becomes sorted(seqs) == list(range(expected)) — clean.

On the seal: Adding the sealed record shape to the contract. The layering is:

  1. seq → ordering (0-indexed)
  2. running_count == seq + 1 → per-record consistency
  3. hash chain → tamper-evidence (under extensions)
  4. seal → tail-drop detection (total pins expected count)
  5. RFC 3161 external anchor → residual closure (under extensions, optional)

The irreducible residual (suffix drop + suppressed seal) is honest and documented — no field can close it. That's the right thing to say in the contract spec rather than pretend it's solved.

Please open the PR against the branch (nagasatish007:feat/governance-decision-contract). I'll review and merge your fixtures directly. You'll be co-authored on the final commit.

The branch is open for push at: https://github.com/nagasatish007/crewAI/tree/feat/governance-decision-contract

Appreciate the rigor here — this thread produced a better contract than any of us would have written alone.

@safal207 safal207 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong direction, and the issue-thread mapping to TOCTOU closure is promising. I checked the current head (b14c3a1), and the contract still does not yet include the three fields discussed in #5888: intent_digest, target_state_digest, and continuation_id.

Two contract-level points seem worth settling before merge:

  1. Binding fields should not be optional for an executable ALLOW.

GovernanceDecision is currently total=False, and the docstring says the minimum useful decision is {decision, reason}. That is fine for descriptive logging, but not for an authorization object: an allow without decision_id, agent_id, tool, params_hash/intent_digest, issued_at, and policy context cannot be safely verified by the executor.

A small option would be either:

  • a required base TypedDict plus optional extension fields; or
  • a framework-owned validate_governance_decision() that enforces route-specific requirements.

For example, allow / require_approval should fail validation when the intent binding is absent.

  1. seq + running_count provide gap detection, not standalone proof of completeness.

They detect a missing record inside a retained sequence, but a producer can still emit a shorter contiguous stream or omit a suffix. I would soften the docstring from “prove completeness” to “detect internal gaps,” unless the run also has an externally anchored terminal count/checkpoint/root.

Suggested contract tests for the next commit:

  • changed canonical args => intent_digest mismatch => fail closed;
  • changed target_state_digest => revalidation required;
  • expired authorization => deny;
  • terminal outcome already exists for the same intent/idempotency key => deny duplicate execution;
  • require_approval resume with a different continuation_id => deny;
  • allow missing required binding fields => validation failure.

This keeps the PR vendor-neutral while making the serialized object usable as an authorization boundary rather than only an audit envelope.

Copy link
Copy Markdown

@nagasatish007 Thanks for taking the TOCTOU point seriously and mapping it into concrete contract fields. I reviewed the current head and left a small note on route-specific validation plus the completeness wording.

Really glad to see this move from issue discussion into an actual CrewAI contract. I’d be happy to help with compact test fixtures for exact-intent binding, target-state revalidation, continuation mismatch, and duplicate side-effect prevention as the next commit lands.

This is exactly the kind of framework-level work where a small invariant can prevent an entire class of agent failures.

@safal207

Copy link
Copy Markdown

@yurukusa Your field notes are unusually valuable because they come from real long-running use rather than a synthetic benchmark. The classification trap especially feels like the beginning of a useful test category, not just a one-off bug.

I am working on a memory model where provenance answers “where did this come from?” and purpose-bound eligibility answers “what is this memory allowed to influence?” Your example is a perfect case: a note can remain valid for project continuity while being ineligible for message-authority classification.

I would genuinely like to compare a few sanitized failure cases with you — no private memory contents needed, just the pattern:

  • intended memory use;
  • phrase or label that triggered the problem;
  • unintended downstream behavior;
  • what change fixed it.

A handful of examples could become neutral regression tests for any persistent-memory implementation, including Claude Code hooks, without tying the idea to one framework.

Your PreCompact + mission.md practice also looks like a strong real-world continuity baseline. Thanks for documenting the messy parts, not only the architecture that worked.

@nagasatish007

Copy link
Copy Markdown
Author

@chopmob-cloud @safal207 @vaaraio @XuebinMa @rpelevin @arian-gogani — field list is final. Latest commit has the updated GovernanceDecision contract with:

  • intent_ref (stable semantic identity, no timestamp) + receipt_ref (per-record, timestamped)
  • intent_digest, target_state_digest, continuation_id (TOCTOU closure)
  • normalization_id (declares how params were normalized before hashing)
  • normalized_scope (explicit, fails closed if missing)
  • 0-indexed seq / running_count (running_count == seq + 1)
  • GovernanceSeal terminal record (tail-drop detection)
  • validate_governance_decision() with route-specific enforcement
  • revise documented as advisory-only (no side effect)
  • RFC 8785 (JCS) mandated for all digest computation

Ready for contributions:

  • @chopmob-cloud: envelope-shaped JCS conformance vectors matching the final field list (Python + Node, byte-for-byte reproducible)
  • @safal207: fail-closed contract fixtures (exact-intent mismatch, target-state drift, continuation mismatch, idempotency violation) translated into CrewAI pytest format
  • @vaaraio: contiguity + sealing fixtures PR against the branch (now 0-indexed)
  • @XuebinMa: unwrap normalization annex + test-2/test-3 conformance cases (separate repo, referenced from contract)

Branch is open: nagasatish007:feat/governance-decision-contract

@chopmob-cloud

Copy link
Copy Markdown

@nagasatish007 the conformance vectors are ready, matching the final GovernanceDecision field list: vectors/governance_decision_v1/ in https://github.com/chopmob-cloud/algovoi-jcs-conformance-vectors

It computes all five constructions from the contract, byte-for-byte reproducible across Python and an independent Node implementation (42/42 each, also validated clean-box against the published algovoi-substrate / @algovoi/substrate packages):

  • params_hash, intent_digest, intent_ref, receipt_ref, decision_context_hash over five representative decisions (allow / deny / require_approval / revise) plus a Unicode-scope case — the exact input where json.dumps(sort_keys=True) and JCS diverge, so it pins the canonicalization, not just the happy path.
  • JCS normalization vectors (the canonical bytes, not only the final hash), so a verifier can check its canonicalizer directly.
  • Negative vectors for your validate_governance_decision route rules (an allow with no policy_refs, a deny with no reason, etc.).
  • seq / seal contiguity vectors (running_count == seq + 1, sealed-total completeness).
  • The keystone as the Composability-section reference: a GovernanceDecision's intent_ref is the per-decision anchor a full identity-through-execution chain composes over, verifiable by another crew with no shared runtime.

Run python runner_python.py and node runner_node.js in that folder; both print 42/42 PASS against the same published hashes. extensions["algovoi"] can carry {"keystone_ref": "...", "jcs_vectors": "<this set>"}. The schema is yours; this is just the JCS + SHA-256 conformance layer for it. If the field list shifts, tell me and I'll regenerate to match.

@safal207

Copy link
Copy Markdown

@nagasatish007 yes — I can take those on.

I’ll prepare the fail-closed contract fixtures in CrewAI pytest shape against the finalized "GovernanceDecision" field list, covering:

  • exact-intent mismatch → "deny"
  • target-state drift → "revalidate"
  • continuation mismatch → "deny"
  • duplicate outcome / idempotency collision → "deny"

I’ll keep them pinned to the final envelope fields / route requirements so they test contract behavior rather than an intermediate shape.

Copy link
Copy Markdown

Drafted the fail-closed fixture shape against the finalized GovernanceDecision contract. I’ll adapt names to the exact helper/evaluator API in this PR, but the behavioral matrix is:

Fixture Changed binding Expected result
exact-intent mismatch approved intent_ref / action binding no longer matches candidate deny
target-state drift action still matches, but current target-state digest differs from approval-time state revalidate
continuation mismatch same action is replayed under a different continuation/session/thread deny
duplicate outcome / idempotency collision same governed outcome or idempotency key is submitted twice first accepted, duplicate deny

Core invariant:

authorization binds exact action + exact target state + exact continuation + non-duplicate outcome

So an allow is not a general permission; it is valid only for that exact executable candidate, in that continuation, against the expected target state, and for one non-replayed outcome.

I’ll keep these as contract fixtures rather than middleware-behavior tests, so they stay reusable if the execution hook changes.

@safal207

Copy link
Copy Markdown

I drafted the fail-closed pytest fixture file for the four contract cases we discussed, but I don’t have write access to the PR branch from my side.

I can paste the file content here directly if that’s easiest, or open it from my fork if you’d prefer that flow.

The fixture file covers:

  • exact-intent mismatch → "deny"
  • target-state drift → "revalidate"
  • continuation mismatch → "deny"
  • duplicate outcome / idempotency collision → "deny"

and keeps the invariant explicit:

"authorization binds exact action + exact target state + exact continuation + non-duplicate outcome"

Copy link
Copy Markdown

@nagasatish007 @chopmob-cloud thanks — the field list and conformance vectors give us a stable target now.

I’ve drafted the fail-closed pytest fixture file against the current contract shape, covering:

  • exact-intent mismatch → deny
  • target-state drift → revalidate
  • continuation mismatch → deny
  • duplicate outcome / idempotency collision → deny

I’ll keep these as contract-level fixtures so they complement the Python/Node JCS vectors rather than duplicate runtime-specific middleware behavior.

I don’t currently have write access to the PR branch, so I can either paste the patch here or open it from a fork — whichever flow you prefer.

@nagasatish007

Copy link
Copy Markdown
Author

@chopmob-cloud — the vectors are exactly what the contract needs. 42/42 across
Python and Node, with the Unicode-scope divergence case, is the right proof that
this isn't "works on ASCII." I'll reference your vectors from the contract's
"Conformance" section and add the negative validation vectors alongside our
existing validate_governance_decision() test suite.

Field list is stable — no further changes planned before merge.

@safal207 — go ahead. Please open a PR against the branch
(nagasatish007:feat/governance-decision-contract) with the four fail-closed
fixtures in pytest format. I'll review and merge directly. The
helper/evaluator API is:

  • validate_governance_decision(d) -> (bool, errors) for route validation
  • verify_contiguity(records, seal=None) -> bool for seq checks
  • Intent binding verification is currently documented as contract invariants
    (not yet a shipped function) — your fixtures should assert the expected
    behavior
    so we can build the executor-side enforcement against them.

Both of you will be co-authored on the merge commit. Appreciate the rigor.

@nagasatish007

Copy link
Copy Markdown
Author

@safal207 — open it from a fork. PR against
nagasatish007:feat/governance-decision-contract and I'll merge directly.
Co-author credit on the squash commit.

@chopmob-cloud

Copy link
Copy Markdown

Thanks, glad it's useful. If the field list ever shifts post-merge I'll regenerate the set to match. Good luck with the rest of the project.

Copy link
Copy Markdown

@nagasatish007 thanks — confirmed. I’m currently mobile-only and don’t have a local workstation available to open a fork PR cleanly right now, so I’m pasting the proposed pytest file here for direct application to nagasatish007:feat/governance-decision-contract.

Suggested path:

lib/crewai/tests/test_governance_decision_fail_closed_contract.py

"""
Fail-closed contract fixtures for GovernanceDecision.

These tests are deliberately contract-level. They do not depend on a concrete
middleware hook implementation. Instead, they pin the expected behavior a
runtime/evaluator must preserve when binding an authorization record to an
executable candidate.

Invariant:
    authorization binds exact action + exact target state + exact continuation
    + non-duplicate outcome
"""

from __future__ import annotations

from typing import Any, Literal

from crewai.governance.governance_decision import GovernanceDecision, GovernanceOutcome

BindingVerdict = Literal["allow", "deny", "revalidate"]


def evaluate_contract_binding(
    decision: GovernanceDecision,
    candidate: dict[str, Any],
    existing_outcomes: list[GovernanceOutcome] | None = None,
) -> tuple[BindingVerdict, str]:
    """Small test oracle for the fail-closed GovernanceDecision contract."""
    existing_outcomes = existing_outcomes or []

    if decision.get("decision") != "allow":
        return "deny", "decision_not_allow"

    for field in ("agent_id", "tool", "target", "normalized_scope"):
        if decision.get(field) != candidate.get(field):
            return "deny", f"{field}_mismatch"

    for field in ("intent_ref", "intent_digest", "params_hash"):
        if decision.get(field) and decision.get(field) != candidate.get(field):
            return "deny", "exact_intent_mismatch"

    if decision.get("continuation_id") != candidate.get("continuation_id"):
        return "deny", "continuation_mismatch"

    if decision.get("target_state_digest") != candidate.get("target_state_digest"):
        return "revalidate", "target_state_drift"

    for outcome in existing_outcomes:
        same_decision = outcome.get("decision_id") == decision.get("decision_id")
        same_intent = outcome.get("intent_ref") == decision.get("intent_ref")
        same_idempotency = (
            outcome.get("extensions", {}).get("idempotency_key")
            == decision.get("idempotency_key")
        )
        terminal = outcome.get("outcome") in {"executed", "blocked", "error", "timeout"}
        if terminal and same_decision and same_intent and same_idempotency:
            return "deny", "duplicate_outcome"

    return "allow", "contract_binding_ok"


def base_allow_decision() -> GovernanceDecision:
    return {
        "decision_id": "d-fail-closed-001",
        "intent_ref": "sha256:intent-ref-approved",
        "receipt_ref": "sha256:receipt-ref-approved",
        "agent_id": "support-bot",
        "tool": "send_email",
        "request_id": "req-fail-closed-001",
        "target": "email:user@example.com",
        "normalized_scope": "email/outbound/user-summary",
        "params_hash": "sha256:params-approved",
        "intent_digest": "sha256:intent-digest-approved",
        "target_state_digest": "sha256:target-state-at-authorization",
        "continuation_id": "cont:original-thread",
        "normalization_id": "jcs-sha256",
        "idempotency_key": "idem:send-summary:user@example.com:001",
        "policy_refs": ["allow-user-summary-email-v1"],
        "decision": "allow",
        "reason": "Authorized exact outbound summary email.",
        "issued_at": "2026-06-25T14:00:00Z",
        "seq": 0,
        "running_count": 1,
    }


def matching_candidate() -> dict[str, Any]:
    return {
        "agent_id": "support-bot",
        "tool": "send_email",
        "target": "email:user@example.com",
        "normalized_scope": "email/outbound/user-summary",
        "params_hash": "sha256:params-approved",
        "intent_ref": "sha256:intent-ref-approved",
        "intent_digest": "sha256:intent-digest-approved",
        "target_state_digest": "sha256:target-state-at-authorization",
        "continuation_id": "cont:original-thread",
        "idempotency_key": "idem:send-summary:user@example.com:001",
    }


def test_exact_intent_mismatch_denies() -> None:
    """Changed executable intent must deny, even if actor/tool/target match."""
    decision = base_allow_decision()
    candidate = matching_candidate()
    candidate["intent_digest"] = "sha256:intent-digest-mutated"

    verdict, reason = evaluate_contract_binding(decision, candidate)

    assert verdict == "deny"
    assert reason == "exact_intent_mismatch"


def test_target_state_drift_revalidates() -> None:
    """Same action against changed target state requires revalidation."""
    decision = base_allow_decision()
    candidate = matching_candidate()
    candidate["target_state_digest"] = "sha256:target-state-drifted"

    verdict, reason = evaluate_contract_binding(decision, candidate)

    assert verdict == "revalidate"
    assert reason == "target_state_drift"


def test_continuation_mismatch_denies() -> None:
    """Approved action cannot be replayed under another continuation."""
    decision = base_allow_decision()
    candidate = matching_candidate()
    candidate["continuation_id"] = "cont:different-thread"

    verdict, reason = evaluate_contract_binding(decision, candidate)

    assert verdict == "deny"
    assert reason == "continuation_mismatch"


def test_duplicate_outcome_idempotency_collision_denies() -> None:
    """A terminal outcome for the same idempotency key blocks re-execution."""
    decision = base_allow_decision()
    candidate = matching_candidate()
    existing_outcome: GovernanceOutcome = {
        "decision_id": "d-fail-closed-001",
        "intent_ref": "sha256:intent-ref-approved",
        "receipt_ref": "sha256:outcome-receipt-001",
        "outcome": "executed",
        "tool_output_hash": "sha256:tool-output-001",
        "completed_at": "2026-06-25T14:00:02Z",
        "seq": 0,
        "extensions": {
            "idempotency_key": "idem:send-summary:user@example.com:001",
        },
    }

    verdict, reason = evaluate_contract_binding(
        decision,
        candidate,
        existing_outcomes=[existing_outcome],
    )

    assert verdict == "deny"
    assert reason == "duplicate_outcome"

I kept this as a contract oracle rather than executor middleware code, matching your note that intent-binding enforcement is currently documented as invariants and not yet a shipped function. If you want a smaller diff, I can also adapt these into the existing test_governance_decision_contract.py file instead of a separate file.

@nagasatish007

Copy link
Copy Markdown
Author

@safal207 — clean fixtures. I'll commit this as
lib/crewai/tests/governance/test_governance_decision_fail_closed_contract.py
(under the existing governance test directory rather than root tests/) with
co-author credit. Applying as-is — the contract oracle approach is the right
level of abstraction.

One small alignment: moving idempotency_key out of
outcome.extensions["idempotency_key"] and into a top-level field on
GovernanceOutcome to match where it lives on GovernanceDecision. Will adjust
in the commit. Everything else goes in verbatim.

Add fail-closed contract tests for GovernanceDecision.

Copy link
Copy Markdown

Thanks — really appreciate you picking this up and landing it in the governance test layout.

That path works well for me. Keeping these as contract-level fail-closed fixtures under the existing governance test structure is exactly the right shape.

And thank you for the co-author credit — glad this could be useful to the PR.

@vaaraio

vaaraio commented Jun 25, 2026

Copy link
Copy Markdown

The conformance set here looks solid on the two axes it covers: the JCS derivation vectors (the Unicode-scope divergence is the right trap, it kills "works on ASCII") and the four fail-closed oracles.

One axis I don't see pinned yet is sealed completeness: proving a record wasn't dropped, from the held set alone. It matters because the two drop modes aren't symmetric:

  • a mid-stream gap is self-evident from the running count (seq 3 present, seq 2 missing),
  • but a dropped tail is invisible without a boundary total to check against. That's what the terminal GovernanceSeal total is for.

I put together a small vector set for exactly that case, in the same envelope: complete, dropped (mid-gap at seq 2), tail_sealed (suffix drop caught only because the seal pins the total), and tail_unsealed (the residual: with no seal a held prefix looks whole and passes). Each is a signed {record, signature}, and the checker reproduces every verdict with no framework import, just rfc8785 + cryptography:

https://github.com/vaaraio/vaara/tree/main/tests/vectors/governance_decision_v0

Offered as a drop-in for the completeness row of the Conformance section, next to the derivation and fail-closed vectors already here. It builds straight on the sealing the contract already carries.

@chopmob-cloud

Copy link
Copy Markdown

Folded the completeness axis into the conformance set, so the Conformance section has one reference across all three: JCS derivation, the fail-closed route oracles, and sealed completeness.

@vaaraio your tail-drop point is the part worth getting exact — the two drop modes aren't symmetric. governance_decision_v1 now carries the four cases, derived from the contract's own GovernanceSeal:

  • complete — sealed 0..2, verifies
  • mid-gap — seq 1 dropped, self-evident from the running count
  • tail-sealed — a dropped suffix, caught only because the seal pins the total
  • tail-unsealed — the honest residual: with no seal, a held prefix looks whole and passes; no per-record field closes it (an external RFC 3161 anchor would)

44/44 Python == Node byte-for-byte, VM2 clean-box against the published packages. Same set, same place: vectors/governance_decision_v1/.

Copy link
Copy Markdown

@vaaraio This is a strong addition — agreed that sealed completeness is a distinct conformance axis and that the dropped-tail case is the one a held prefix cannot self-diagnose without a terminal total.

I also think there is a useful temporal reading of the same seal semantics: completeness is not only "was the set whole?" but "was the set whole for the phase in which this decision remained actionable?" A decision record can stay auditable after it stops being spendable, so the completeness check should be able to distinguish:

  • complete and still actionable;
  • complete but only historically auditable;
  • sealed tail loss detected before action;
  • sealed tail loss detected only during later audit / replay.

That suggests a nice follow-on fixture family alongside your completeness vectors: same signed envelope shape, but with validity-window / phase metadata so the verifier can prove both sealed completeness and whether the record was complete during the authority phase that mattered.

So I’d be very in favor of treating your completeness set as the new base row, then later adding a temporal-phase row over the same envelope: fresh, resumed, stale, expired-but-auditable, and revalidated completeness.

Thanks for putting the vector set together — it fits the contract direction well.

@safal207

Copy link
Copy Markdown

@vaaraio @nagasatish007 One follow-on split that may be worth preserving, if completeness stays in the conformance set, is structural completeness vs phase-valid completeness.

Structural completeness asks:

  • is the record set whole as a sequence / seal set?
  • are there missing middle records or a dropped tail?

Phase-valid completeness asks a different question:

  • was the record set complete for the phase in which the decision remained actionable?

That distinction matters because the same envelope can be complete enough to remain historically auditable while no longer being complete enough to safely spend as live authority.

So I’d expect a later fixture family over the same signed shape to distinguish at least:

  • complete and still actionable;
  • complete but only historically auditable;
  • tail loss detected before action;
  • tail loss discovered only during later audit / replay;
  • stale/expired authority where the record remains durable but no longer spendable.

I don’t think this needs to expand the current PR scope, but if completeness is now part of the contract surface, that temporal split feels like the next useful row in the conformance matrix.

@chopmob-cloud

Copy link
Copy Markdown

@vaaraio credited your framing of the sealed-completeness axis in the set's README (vectors/governance_decision_v1/README.md). The tail-drop asymmetry was the right call and it sharpened the set — thanks for that.

@rpelevin

Copy link
Copy Markdown

I would treat the sealed-completeness vectors as the base set, then add temporal phase as a separate conformance row rather than broadening the base contract.

The verifier question is not only whether the retained sequence is whole. It is whether the sequence was whole during the phase in which a decision could still be spent.

I would make the phase row small:

  1. accept fresh complete sequence with valid authority window and terminal seal;
  2. accept expired complete sequence as audit evidence only, not spendable authority;
  3. reject resumed authority when the prior seal is complete but the validity window or continuation binding is stale;
  4. reject tail loss discovered before execution;
  5. reject tail loss discovered during replay from being retroactively treated as authorized execution;
  6. require revalidation to mint a new decision identity instead of extending the old one silently.

That keeps sealed completeness and temporal authority related but separate. The seal proves the set was whole. The phase check proves whether that whole set was still allowed to authorize execution at the point of use.

Boundary: architecture and fixture-shape feedback only; no claim about using this project, running the branch, or validating implementation behavior.

@vaaraio

vaaraio commented Jun 25, 2026

Copy link
Copy Markdown

This shipped in v1.15.0 today.

The proxy intercepts tools/call, mints a credential tied to the attestation digest and argument commitment, then blocks the call before reaching the upstream if the credential doesn't check out. A downstream CredentialGateway verifies it with no round-trip to the proxy.

Config is one flag pair: vaara-mcp-proxy --attest-signing-key key.pem --tool-constraints constraints.json, where constraints.json carries per-tool capability constraints:

{
  "tools": {
    "read_file": [{"arg": "path", "op": "in", "value": ["/tmp", "/var/data"]}]
  }
}

Each allowed call gets a JWT at params._meta["vaara/credential"]. A constrained tool with no valid credential gets MCP error -32603. The upstream never sees the request.

Spec: SEP-2828. Corpus and verifier (no Vaara import needed): vaaraio/vaara.

pip install vaara==1.15.0

@rpelevin

Copy link
Copy Markdown

Good proof point. The important property here is that the upstream tool never sees an uncredentialed constrained call.

I would turn this into a small conformance row alongside the GovernanceDecision contract:

  1. proxy computes the same argument commitment the decision record names;
  2. issued credential binds tool name, normalized arguments, constraint profile, phase, and decision id;
  3. downstream verifier accepts without a proxy round trip only when those bindings match;
  4. missing credential, stale phase, wrong constraint profile, changed arguments, or replayed credential fails before upstream call;
  5. deny and fail-closed outcomes emit positive GovernanceOutcome records rather than relying on absence of a tool response.

The key fixture is a constrained tool call with a valid credential, then the same call with one field changed. The first reaches the gateway. The second is blocked before the upstream tool sees it, and the resulting record explains which binding failed.

That makes the release more than an implementation detail. It proves the contract can be enforced at the MCP boundary without making the application tool responsible for rediscovering authorization.

Copy link
Copy Markdown

@vaaraio Thanks — this is a strong reference implementation for the contract boundary discussed here.

The important property is not only that a credential exists, but that it is bound to the attestation digest and argument commitment, independently verifiable downstream, and rejected before the constrained call reaches the upstream. That gives us a concrete implementation shape for the credential_bound_tool_authority conformance case.

I would keep the credential and the execution receipt explicitly distinct in the contract:

  • the credential carries spendable authority for the exact scoped call;
  • the decision/outcome records explain and prove what was authorized and what actually happened;
  • a missing, expired, tenant-mismatched, or argument-mismatched credential must fail closed before side effect;
  • the deny path should still produce a positive governance outcome rather than only an absent upstream response.

The verifier and corpus being usable without importing Vaara is especially useful: it makes this viable as an external conformance target rather than a framework dependency. I’ll use the v1.15.0 behavior as a reference when tightening the corresponding LS fixture and cross-engine vectors.

Copy link
Copy Markdown

@vaaraio This is very useful. credential_binding_v0 looks like the right level of specificity for a cross-engine fixture: it tests the spendability of authority at runtime without requiring the framework itself to know Vaara internals.

The minimal shared naming I’d suggest keeping stable across LS / Vaara-style fixtures is:

  • pos_valid_grant: credential matches runtime tool, args/commitment, tenant, and attestation scope;
  • neg_wrong_tool: credential minted for one tool cannot authorize another;
  • neg_wrong_args: changed argument commitment fails before upstream;
  • neg_wrong_tenant: tenant/scope mismatch fails before upstream;
  • neg_expired_or_superseded: stale authority cannot be spent;
  • neg_missing_credential: constrained call without credential fails closed;
  • pos_blocked_outcome: deny path still emits a concrete governance outcome/receipt.

That gives us a clean boundary:

  1. normalization decides what exact action identity the decision binds to;
  2. credential binding decides whether that authority can be spent now;
  3. outcome/receipt proves what happened, including blocked-before-upstream cases.

If v1.16.0 ships those cases as a corpus/verifier, I’d treat it as a reference fixture for the LS credential_bound_tool_authority track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]:Governance middleware hook for tool call authorization

6 participants