feat: Add GovernanceDecision and GovernanceOutcome contract types#6030
feat: Add GovernanceDecision and GovernanceOutcome contract types#6030nagasatish007 wants to merge 21 commits into
Conversation
This module defines the GovernanceDecision and GovernanceOutcome TypedDicts for vendor-neutral governance hooks in CrewAI. It specifies the structure and fields for pre-execution and post-execution records used in governance processes.
This file contains contract tests for GovernanceDecision and GovernanceOutcome, validating decision routes, JSON serialization, and outcome references.
|
Caution Review failedAn error occurred during the review process. Please try again later. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
lib/crewai/src/crewai/governance/__init__.py (1)
1-2: ⚡ Quick winExport types from
__init__.pyfor better ergonomics.The governance module's
__init__.pyis empty, requiring users to import from the full submodule path. Exporting the public types would make imports more convenient and discoverable.📦 Proposed export pattern
+""" +CrewAI governance contracts. +""" + +from crewai.governance.governance_decision import ( + GovernanceDecision, + GovernanceOutcome, +) + +__all__ = ["GovernanceDecision", "GovernanceOutcome"]This allows cleaner imports:
from crewai.governance import GovernanceDecision, GovernanceOutcomeinstead of:
from crewai.governance.governance_decision import GovernanceDecision, GovernanceOutcome🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/src/crewai/governance/__init__.py` around lines 1 - 2, Add public exports to the governance package by importing and re-exporting the key types in the package __init__.py: import GovernanceDecision and GovernanceOutcome (and any other public classes) from their defining module (e.g., governance_decision) and add them to the package's __all__ so users can do "from crewai.governance import GovernanceDecision, GovernanceOutcome"; ensure you reference the exact class names GovernanceDecision and GovernanceOutcome and any other public symbols you want exposed.test_governance_decision_contract.py (1)
98-103: ⚡ Quick winConsider adding error outcome fixture for completeness.
The current outcome fixture only tests
outcome="executed". SinceGovernanceOutcomedefineserror_typeanderror_messagefields and includes"error"as a valid outcome, adding a fixture that demonstrates the error case would strengthen contract coverage.🧪 Proposed error outcome fixture
FIXTURE_OUTCOME_ERROR: GovernanceOutcome = { "decision_id": "d-002", # Link to FIXTURE_DENY or any decision "outcome": "error", "error_type": "PermissionError", "error_message": "Insufficient permissions to execute tool", "completed_at": "2026-06-03T14:01:05Z", }Then add to
test_all_fixtures_json_serializable:fixtures: list[dict[str, Any]] = [ FIXTURE_ALLOW, FIXTURE_DENY, FIXTURE_REQUIRE_APPROVAL, FIXTURE_ALLOW_WITH_EXTENSION, FIXTURE_REVISE, FIXTURE_OUTCOME, + FIXTURE_OUTCOME_ERROR, FIXTURE_UNKNOWN_EXTENSION, ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test_governance_decision_contract.py` around lines 98 - 103, Add an error-case fixture for GovernanceOutcome named FIXTURE_OUTCOME_ERROR that sets "outcome" to "error" and includes "error_type" and "error_message" plus "decision_id" and "completed_at" to mirror FIXTURE_OUTCOME structure; then include FIXTURE_OUTCOME_ERROR in the test_all_fixtures_json_serializable assertions so the error-path (outcome="error") is covered by serialization tests (reference symbols: FIXTURE_OUTCOME, FIXTURE_OUTCOME_ERROR, GovernanceOutcome, test_all_fixtures_json_serializable).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/crewai/src/crewai/governance/__init__.py`:
- Around line 1-2: Add public exports to the governance package by importing and
re-exporting the key types in the package __init__.py: import GovernanceDecision
and GovernanceOutcome (and any other public classes) from their defining module
(e.g., governance_decision) and add them to the package's __all__ so users can
do "from crewai.governance import GovernanceDecision, GovernanceOutcome"; ensure
you reference the exact class names GovernanceDecision and GovernanceOutcome and
any other public symbols you want exposed.
In `@test_governance_decision_contract.py`:
- Around line 98-103: Add an error-case fixture for GovernanceOutcome named
FIXTURE_OUTCOME_ERROR that sets "outcome" to "error" and includes "error_type"
and "error_message" plus "decision_id" and "completed_at" to mirror
FIXTURE_OUTCOME structure; then include FIXTURE_OUTCOME_ERROR in the
test_all_fixtures_json_serializable assertions so the error-path
(outcome="error") is covered by serialization tests (reference symbols:
FIXTURE_OUTCOME, FIXTURE_OUTCOME_ERROR, GovernanceOutcome,
test_all_fixtures_json_serializable).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 00f632c3-6eb2-4dd0-89e0-6bb603ca4224
📒 Files selected for processing (3)
governance_decision.pylib/crewai/src/crewai/governance/__init__.pytest_governance_decision_contract.py
|
Thanks, Naga. Directionally this is the right contract shape. One mechanical thing I would fix before merge: I would also move the contract test into CrewAI's test tree, e.g. Field names and boundary look right to me: Optional but useful: add one |
This file contains contract tests for GovernanceDecision and GovernanceOutcome, validating decision routes, JSON serialization, and outcome references.
This module defines the GovernanceDecision and GovernanceOutcome types for vendor-neutral governance hooks in CrewAI, including their fields and documentation.
Added test for error outcomes to validate error_type and error_message fields.
|
@rpelevin the changes have been done. |
|
Thanks, Naga. I went through the latest commits. This addresses the core things I called out: the contract now lives under crewai.governance, the types are exported from the package, GovernanceDecision and GovernanceOutcome stay separate with decision_id linking them, and the extension round-trip plus error-outcome coverage are now in the tests. From my side the field names and vendor-neutral boundary look right. I would avoid adding more scope here and let the CrewAI maintainers decide the final test layout and merge mechanics. Great work getting this tightened up. |
Adds monotonic seq and running_count fields to GovernanceDecision and GovernanceOutcome as core (non-extension) fields for completeness evidence. A verifier holding N records can prove no records were dropped: - seq must form contiguous 1..N (gap = provable omission) - max(running_count) must equal len(records) Includes verify_contiguity() utility and 4 contract tests for the gap case. Ref: vaaraio/vaara#283 (working implementation) Co-authored-by: rpelevin
Add seq and running_count fields to governance decision fixtures and tests for omission detection.
|
@nagasatish007 here are the contract-test fixtures you invited, pinned to the base the shipped reference uses and extended to the tail-drop case the contract cannot express yet. On the index base. The fixtures and the verifier have to agree, or On the sealing record. The per-record Fixtures and a sealing-aware verifier. Every assertion here also passes against the shipped from collections import Counter
def verify_contiguity(records: list[dict]) -> bool:
"""0-indexed, sealing-aware. expected = max(max_seq + 1, max_running, sealed_total)."""
seq_records = [r for r in records if not r.get("sealed")]
sealed_total = max((int(r["total"]) for r in records if r.get("sealed")), default=0)
if not seq_records:
return sealed_total == 0 # a seal over zero held records is a fully dropped run
seqs = [int(r["seq"]) for r in seq_records]
counts = [int(r["running_count"]) for r in seq_records]
expected = max(max(seqs) + 1, max(counts), sealed_total)
duplicates = [s for s, n in Counter(seqs).items() if n > 1]
missing = sorted(set(range(expected)) - set(seqs))
count_mismatch = [r for r in seq_records if int(r["running_count"]) != int(r["seq"]) + 1]
return not missing and not duplicates and not count_mismatch and len(seq_records) == expected
# 0-indexed: the first decision is seq 0, running_count 1.
FIXTURE_CONTIGUOUS_RUN = [
{"decision_id": "d-101", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
{"decision_id": "d-102", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 2},
{"decision_id": "d-103", "tool": "write", "decision": "deny", "seq": 2, "running_count": 3},
]
FIXTURE_SEQ_GAP = [
{"decision_id": "d-201", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
{"decision_id": "d-202", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 2},
# seq 2 missing: provable interior gap
{"decision_id": "d-204", "tool": "deploy", "decision": "allow", "seq": 3, "running_count": 4},
]
FIXTURE_RUNNING_COUNT_MISMATCH = [
{"decision_id": "d-301", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
# running_count 4 at seq 1 says a later record existed; only 2 held
{"decision_id": "d-302", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 4},
]
# Tail drop CAUGHT by the seal: held 0..2, seal pins total 4, the 4th decision is gone.
FIXTURE_TAIL_DROP_SEALED = [
{"decision_id": "d-401", "tool": "search", "decision": "allow", "seq": 0, "running_count": 1},
{"decision_id": "d-402", "tool": "calc", "decision": "allow", "seq": 1, "running_count": 2},
{"decision_id": "d-403", "tool": "write", "decision": "allow", "seq": 2, "running_count": 3},
{"boundary_id": "crew-run-1", "sealed": True, "total": 4},
]
# The same tail drop WITHOUT the seal reads as complete: the irreducible residual.
FIXTURE_TAIL_DROP_NO_SEAL = FIXTURE_TAIL_DROP_SEALED[:3]
# A finalized run with no drop passes.
FIXTURE_SEALED_WHOLE = FIXTURE_CONTIGUOUS_RUN + [{"boundary_id": "crew-run-1", "sealed": True, "total": 3}]
def test_contiguous_run_passes():
assert verify_contiguity(FIXTURE_CONTIGUOUS_RUN) is True
def test_interior_gap_fails():
assert verify_contiguity(FIXTURE_SEQ_GAP) is False
def test_running_count_omission_fails():
assert verify_contiguity(FIXTURE_RUNNING_COUNT_MISMATCH) is False
def test_tail_drop_caught_by_seal():
assert verify_contiguity(FIXTURE_TAIL_DROP_SEALED) is False
def test_tail_drop_without_seal_is_the_residual():
assert verify_contiguity(FIXTURE_TAIL_DROP_NO_SEAL) is True
def test_sealed_whole_run_passes():
assert verify_contiguity(FIXTURE_SEALED_WHOLE) is TrueI can open this as a PR against your branch instead, if you would rather review it as a diff. Tag me either way. |
|
@vaaraio — these fixtures are exactly what the PR needs. The sealing-aware verifier and the tail-drop test pair ( On the index base: You've convinced me. Switching the contract to 0-indexed. Your argument is right: the reference impl is 0-indexed, the published spec pins 0-indexed, and asking every 0-indexed emitter to add On the seal: Adding the
The irreducible residual (suffix drop + suppressed seal) is honest and documented — no field can close it. That's the right thing to say in the contract spec rather than pretend it's solved. Please open the PR against the branch ( The branch is open for push at: Appreciate the rigor here — this thread produced a better contract than any of us would have written alone. |
safal207
left a comment
There was a problem hiding this comment.
Strong direction, and the issue-thread mapping to TOCTOU closure is promising. I checked the current head (b14c3a1), and the contract still does not yet include the three fields discussed in #5888: intent_digest, target_state_digest, and continuation_id.
Two contract-level points seem worth settling before merge:
- Binding fields should not be optional for an executable ALLOW.
GovernanceDecision is currently total=False, and the docstring says the minimum useful decision is {decision, reason}. That is fine for descriptive logging, but not for an authorization object: an allow without decision_id, agent_id, tool, params_hash/intent_digest, issued_at, and policy context cannot be safely verified by the executor.
A small option would be either:
- a required base TypedDict plus optional extension fields; or
- a framework-owned
validate_governance_decision()that enforces route-specific requirements.
For example, allow / require_approval should fail validation when the intent binding is absent.
seq+running_countprovide gap detection, not standalone proof of completeness.
They detect a missing record inside a retained sequence, but a producer can still emit a shorter contiguous stream or omit a suffix. I would soften the docstring from “prove completeness” to “detect internal gaps,” unless the run also has an externally anchored terminal count/checkpoint/root.
Suggested contract tests for the next commit:
- changed canonical args =>
intent_digestmismatch => fail closed; - changed
target_state_digest=> revalidation required; - expired authorization => deny;
- terminal outcome already exists for the same intent/idempotency key => deny duplicate execution;
require_approvalresume with a differentcontinuation_id=> deny;allowmissing required binding fields => validation failure.
This keeps the PR vendor-neutral while making the serialized object usable as an authorization boundary rather than only an audit envelope.
|
@nagasatish007 Thanks for taking the TOCTOU point seriously and mapping it into concrete contract fields. I reviewed the current head and left a small note on route-specific validation plus the completeness wording. Really glad to see this move from issue discussion into an actual CrewAI contract. I’d be happy to help with compact test fixtures for exact-intent binding, target-state revalidation, continuation mismatch, and duplicate side-effect prevention as the next commit lands. This is exactly the kind of framework-level work where a small invariant can prevent an entire class of agent failures. |
|
@yurukusa Your field notes are unusually valuable because they come from real long-running use rather than a synthetic benchmark. The classification trap especially feels like the beginning of a useful test category, not just a one-off bug. I am working on a memory model where provenance answers “where did this come from?” and purpose-bound eligibility answers “what is this memory allowed to influence?” Your example is a perfect case: a note can remain valid for project continuity while being ineligible for message-authority classification. I would genuinely like to compare a few sanitized failure cases with you — no private memory contents needed, just the pattern:
A handful of examples could become neutral regression tests for any persistent-memory implementation, including Claude Code hooks, without tying the idea to one framework. Your PreCompact + mission.md practice also looks like a strong real-world continuity baseline. Thanks for documenting the messy parts, not only the architecture that worked. |
|
@chopmob-cloud @safal207 @vaaraio @XuebinMa @rpelevin @arian-gogani — field list is final. Latest commit has the updated
Ready for contributions:
Branch is open: |
|
@nagasatish007 the conformance vectors are ready, matching the final It computes all five constructions from the contract, byte-for-byte reproducible across Python and an independent Node implementation (42/42 each, also validated clean-box against the published
Run |
|
@nagasatish007 yes — I can take those on. I’ll prepare the fail-closed contract fixtures in CrewAI pytest shape against the finalized "GovernanceDecision" field list, covering:
I’ll keep them pinned to the final envelope fields / route requirements so they test contract behavior rather than an intermediate shape. |
|
Drafted the fail-closed fixture shape against the finalized
Core invariant: So an I’ll keep these as contract fixtures rather than middleware-behavior tests, so they stay reusable if the execution hook changes. |
|
I drafted the fail-closed pytest fixture file for the four contract cases we discussed, but I don’t have write access to the PR branch from my side. I can paste the file content here directly if that’s easiest, or open it from my fork if you’d prefer that flow. The fixture file covers:
and keeps the invariant explicit: "authorization binds exact action + exact target state + exact continuation + non-duplicate outcome" |
|
@nagasatish007 @chopmob-cloud thanks — the field list and conformance vectors give us a stable target now. I’ve drafted the fail-closed pytest fixture file against the current contract shape, covering:
I’ll keep these as contract-level fixtures so they complement the Python/Node JCS vectors rather than duplicate runtime-specific middleware behavior. I don’t currently have write access to the PR branch, so I can either paste the patch here or open it from a fork — whichever flow you prefer. |
|
@chopmob-cloud — the vectors are exactly what the contract needs. 42/42 across Field list is stable — no further changes planned before merge. @safal207 — go ahead. Please open a PR against the branch
Both of you will be co-authored on the merge commit. Appreciate the rigor. |
|
@safal207 — open it from a fork. PR against |
|
Thanks, glad it's useful. If the field list ever shifts post-merge I'll regenerate the set to match. Good luck with the rest of the project. |
|
@nagasatish007 thanks — confirmed. I’m currently mobile-only and don’t have a local workstation available to open a fork PR cleanly right now, so I’m pasting the proposed pytest file here for direct application to Suggested path:
"""
Fail-closed contract fixtures for GovernanceDecision.
These tests are deliberately contract-level. They do not depend on a concrete
middleware hook implementation. Instead, they pin the expected behavior a
runtime/evaluator must preserve when binding an authorization record to an
executable candidate.
Invariant:
authorization binds exact action + exact target state + exact continuation
+ non-duplicate outcome
"""
from __future__ import annotations
from typing import Any, Literal
from crewai.governance.governance_decision import GovernanceDecision, GovernanceOutcome
BindingVerdict = Literal["allow", "deny", "revalidate"]
def evaluate_contract_binding(
decision: GovernanceDecision,
candidate: dict[str, Any],
existing_outcomes: list[GovernanceOutcome] | None = None,
) -> tuple[BindingVerdict, str]:
"""Small test oracle for the fail-closed GovernanceDecision contract."""
existing_outcomes = existing_outcomes or []
if decision.get("decision") != "allow":
return "deny", "decision_not_allow"
for field in ("agent_id", "tool", "target", "normalized_scope"):
if decision.get(field) != candidate.get(field):
return "deny", f"{field}_mismatch"
for field in ("intent_ref", "intent_digest", "params_hash"):
if decision.get(field) and decision.get(field) != candidate.get(field):
return "deny", "exact_intent_mismatch"
if decision.get("continuation_id") != candidate.get("continuation_id"):
return "deny", "continuation_mismatch"
if decision.get("target_state_digest") != candidate.get("target_state_digest"):
return "revalidate", "target_state_drift"
for outcome in existing_outcomes:
same_decision = outcome.get("decision_id") == decision.get("decision_id")
same_intent = outcome.get("intent_ref") == decision.get("intent_ref")
same_idempotency = (
outcome.get("extensions", {}).get("idempotency_key")
== decision.get("idempotency_key")
)
terminal = outcome.get("outcome") in {"executed", "blocked", "error", "timeout"}
if terminal and same_decision and same_intent and same_idempotency:
return "deny", "duplicate_outcome"
return "allow", "contract_binding_ok"
def base_allow_decision() -> GovernanceDecision:
return {
"decision_id": "d-fail-closed-001",
"intent_ref": "sha256:intent-ref-approved",
"receipt_ref": "sha256:receipt-ref-approved",
"agent_id": "support-bot",
"tool": "send_email",
"request_id": "req-fail-closed-001",
"target": "email:user@example.com",
"normalized_scope": "email/outbound/user-summary",
"params_hash": "sha256:params-approved",
"intent_digest": "sha256:intent-digest-approved",
"target_state_digest": "sha256:target-state-at-authorization",
"continuation_id": "cont:original-thread",
"normalization_id": "jcs-sha256",
"idempotency_key": "idem:send-summary:user@example.com:001",
"policy_refs": ["allow-user-summary-email-v1"],
"decision": "allow",
"reason": "Authorized exact outbound summary email.",
"issued_at": "2026-06-25T14:00:00Z",
"seq": 0,
"running_count": 1,
}
def matching_candidate() -> dict[str, Any]:
return {
"agent_id": "support-bot",
"tool": "send_email",
"target": "email:user@example.com",
"normalized_scope": "email/outbound/user-summary",
"params_hash": "sha256:params-approved",
"intent_ref": "sha256:intent-ref-approved",
"intent_digest": "sha256:intent-digest-approved",
"target_state_digest": "sha256:target-state-at-authorization",
"continuation_id": "cont:original-thread",
"idempotency_key": "idem:send-summary:user@example.com:001",
}
def test_exact_intent_mismatch_denies() -> None:
"""Changed executable intent must deny, even if actor/tool/target match."""
decision = base_allow_decision()
candidate = matching_candidate()
candidate["intent_digest"] = "sha256:intent-digest-mutated"
verdict, reason = evaluate_contract_binding(decision, candidate)
assert verdict == "deny"
assert reason == "exact_intent_mismatch"
def test_target_state_drift_revalidates() -> None:
"""Same action against changed target state requires revalidation."""
decision = base_allow_decision()
candidate = matching_candidate()
candidate["target_state_digest"] = "sha256:target-state-drifted"
verdict, reason = evaluate_contract_binding(decision, candidate)
assert verdict == "revalidate"
assert reason == "target_state_drift"
def test_continuation_mismatch_denies() -> None:
"""Approved action cannot be replayed under another continuation."""
decision = base_allow_decision()
candidate = matching_candidate()
candidate["continuation_id"] = "cont:different-thread"
verdict, reason = evaluate_contract_binding(decision, candidate)
assert verdict == "deny"
assert reason == "continuation_mismatch"
def test_duplicate_outcome_idempotency_collision_denies() -> None:
"""A terminal outcome for the same idempotency key blocks re-execution."""
decision = base_allow_decision()
candidate = matching_candidate()
existing_outcome: GovernanceOutcome = {
"decision_id": "d-fail-closed-001",
"intent_ref": "sha256:intent-ref-approved",
"receipt_ref": "sha256:outcome-receipt-001",
"outcome": "executed",
"tool_output_hash": "sha256:tool-output-001",
"completed_at": "2026-06-25T14:00:02Z",
"seq": 0,
"extensions": {
"idempotency_key": "idem:send-summary:user@example.com:001",
},
}
verdict, reason = evaluate_contract_binding(
decision,
candidate,
existing_outcomes=[existing_outcome],
)
assert verdict == "deny"
assert reason == "duplicate_outcome"I kept this as a contract oracle rather than executor middleware code, matching your note that intent-binding enforcement is currently documented as invariants and not yet a shipped function. If you want a smaller diff, I can also adapt these into the existing |
|
@safal207 — clean fixtures. I'll commit this as One small alignment: moving |
Add fail-closed contract tests for GovernanceDecision.
|
Thanks — really appreciate you picking this up and landing it in the governance test layout. That path works well for me. Keeping these as contract-level fail-closed fixtures under the existing governance test structure is exactly the right shape. And thank you for the co-author credit — glad this could be useful to the PR. |
|
The conformance set here looks solid on the two axes it covers: the JCS derivation vectors (the Unicode-scope divergence is the right trap, it kills "works on ASCII") and the four fail-closed oracles. One axis I don't see pinned yet is sealed completeness: proving a record wasn't dropped, from the held set alone. It matters because the two drop modes aren't symmetric:
I put together a small vector set for exactly that case, in the same envelope: https://github.com/vaaraio/vaara/tree/main/tests/vectors/governance_decision_v0 Offered as a drop-in for the completeness row of the Conformance section, next to the derivation and fail-closed vectors already here. It builds straight on the sealing the contract already carries. |
|
Folded the completeness axis into the conformance set, so the Conformance section has one reference across all three: JCS derivation, the fail-closed route oracles, and sealed completeness. @vaaraio your tail-drop point is the part worth getting exact — the two drop modes aren't symmetric.
44/44 Python == Node byte-for-byte, VM2 clean-box against the published packages. Same set, same place: |
|
@vaaraio This is a strong addition — agreed that sealed completeness is a distinct conformance axis and that the dropped-tail case is the one a held prefix cannot self-diagnose without a terminal total. I also think there is a useful temporal reading of the same seal semantics: completeness is not only "was the set whole?" but "was the set whole for the phase in which this decision remained actionable?" A decision record can stay auditable after it stops being spendable, so the completeness check should be able to distinguish:
That suggests a nice follow-on fixture family alongside your completeness vectors: same signed envelope shape, but with validity-window / phase metadata so the verifier can prove both sealed completeness and whether the record was complete during the authority phase that mattered. So I’d be very in favor of treating your completeness set as the new base row, then later adding a temporal-phase row over the same envelope: fresh, resumed, stale, expired-but-auditable, and revalidated completeness. Thanks for putting the vector set together — it fits the contract direction well. |
|
@vaaraio @nagasatish007 One follow-on split that may be worth preserving, if completeness stays in the conformance set, is structural completeness vs phase-valid completeness. Structural completeness asks:
Phase-valid completeness asks a different question:
That distinction matters because the same envelope can be complete enough to remain historically auditable while no longer being complete enough to safely spend as live authority. So I’d expect a later fixture family over the same signed shape to distinguish at least:
I don’t think this needs to expand the current PR scope, but if completeness is now part of the contract surface, that temporal split feels like the next useful row in the conformance matrix. |
|
@vaaraio credited your framing of the sealed-completeness axis in the set's README ( |
|
I would treat the sealed-completeness vectors as the base set, then add temporal phase as a separate conformance row rather than broadening the base contract. The verifier question is not only whether the retained sequence is whole. It is whether the sequence was whole during the phase in which a decision could still be spent. I would make the phase row small:
That keeps sealed completeness and temporal authority related but separate. The seal proves the set was whole. The phase check proves whether that whole set was still allowed to authorize execution at the point of use. Boundary: architecture and fixture-shape feedback only; no claim about using this project, running the branch, or validating implementation behavior. |
|
This shipped in v1.15.0 today. The proxy intercepts Config is one flag pair: {
"tools": {
"read_file": [{"arg": "path", "op": "in", "value": ["/tmp", "/var/data"]}]
}
}Each allowed call gets a JWT at Spec: SEP-2828. Corpus and verifier (no Vaara import needed): vaaraio/vaara.
|
|
Good proof point. The important property here is that the upstream tool never sees an uncredentialed constrained call. I would turn this into a small conformance row alongside the GovernanceDecision contract:
The key fixture is a constrained tool call with a valid credential, then the same call with one field changed. The first reaches the gateway. The second is blocked before the upstream tool sees it, and the resulting record explains which binding failed. That makes the release more than an implementation detail. It proves the contract can be enforced at the MCP boundary without making the application tool responsible for rediscovering authorization. |
|
@vaaraio Thanks — this is a strong reference implementation for the contract boundary discussed here. The important property is not only that a credential exists, but that it is bound to the attestation digest and argument commitment, independently verifiable downstream, and rejected before the constrained call reaches the upstream. That gives us a concrete implementation shape for the I would keep the credential and the execution receipt explicitly distinct in the contract:
The verifier and corpus being usable without importing Vaara is especially useful: it makes this viable as an external conformance target rather than a framework dependency. I’ll use the v1.15.0 behavior as a reference when tightening the corresponding LS fixture and cross-engine vectors. |
|
@vaaraio This is very useful. The minimal shared naming I’d suggest keeping stable across LS / Vaara-style fixtures is:
That gives us a clean boundary:
If v1.16.0 ships those cases as a corpus/verifier, I’d treat it as a reference fixture for the LS |
Summary
Adds a vendor-neutral
GovernanceDecisionTypedDict contract that crew-level governance hooks (before_tool_call/after_tool_callfrom #5890) can optionally return. Also addsGovernanceOutcomeas the linked post-execution record.This gives the hook one return shape that works for blocking, approval, resume, and audit — without coupling CrewAI to any vendor's evidence format.
Closes #5888 (governance decision contract portion)
Design Principles
extensions: dict[str, Any]is never validated by CrewAI. Vendors attach their evidence (signed receipts, Merkle proofs, etc.) under their own namespace keyGovernanceDecisionis pre-execution,GovernanceOutcomeis post-execution, linked viadecision_idFiles
lib/crewai/src/crewai/governance/governance_decision.pyGovernanceDecision+GovernanceOutcomeTypedDictslib/crewai/tests/governance/test_governance_decision_contract.pyContract Fields (GovernanceDecision)
Test Fixtures
allowdenyrequire_approvalexpires_atreviserevalidate_ifallowextensions["teec"]allowexecuteddecision_idExtension Round-Trip Test
The key test: an unknown vendor extension (
extensions["custom_vendor"]) with nested dicts, arrays, and unicode passes through JSON serialize → deserialize without any data loss or validation failure. This proves CrewAI doesn't filter, validate, or modify vendor evidence.Backward Compatibility
before_tool_call/after_tool_callbehaviorGovernanceDecisionfor richer audit/resume semanticsNoneorTrue, behavior is identical to todayCo-authored with
Co-authored-by: rpelevin rpelevin@users.noreply.github.com
References
before_tool_call/after_tool_callhooks (this PR extends)Summary by CodeRabbit
New Features
Tests