Validate pipeline checkpoints with Pydantic models#24180
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1df4318d92
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| result[phase_id] = _CHECKPOINT_ADAPTER.validate_python(data) | ||
| except ValidationError as e: | ||
| raise CheckpointReadError(f"Checkpoint for phase {phase_id!r} in {self._path} is invalid: {e}") from e |
There was a problem hiding this comment.
Keep fresh runs from validating stale checkpoints
Because Phase.process_message unconditionally calls self._checkpoint_manager.read() when building the phase context, this validation now runs even when the orchestrator is started without resume. A valid-YAML but schema-invalid checkpoint left by an older/interrupted/manual run, such as a status: success entry missing the new tokens or memory_path fields, will abort a fresh run before it can overwrite the file, despite non-resume runs registering every phase from scratch. Consider restricting strict validation to resume paths or otherwise tolerating stale entries for fresh runs.
Useful? React with 👍 / 👎.
Validation ReportAll 21 validations passed. Show details
|
|
What does this PR do?
Replaces the untyped
dict[str, Any]checkpoint storage with Pydantic models (SuccessCheckpoint,FailedCheckpoint,CheckpointTokenInfo), validated on read via a discriminated union on thestatusfield.Key changes:
CheckpointStatusis now aStrEnuminstead of a plain string, eliminatingisinstance(data, dict)guards scattered across the codebaseCheckpointManager.read()returnsdict[str, PhaseCheckpoint]and raisesCheckpointReadErroron any invalid entry (previously silently ignored)CheckpointManager.write_phase_checkpoint()now accepts aPhaseCheckpointmodel instead of a raw dictbase.pyandagentic_phase.pyconstruct typed models directlyMotivated by the review comment in #24164: #24164 (comment)
Motivation
The untyped checkpoint dict made it impossible to know the shape of a checkpoint without reading all write sites. The
isinstance(data, dict)guard insuccessful_phases()was a symptom of this. Pydantic models make the schema explicit, catch corruption early, and let callers use attribute access instead of string keys.Review checklist (to be filled by reviewers)
qa/requiredif this PR needs QA validation, orqa/skip-qaif it does not. Exactly one of the two is required.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged