Validate pipeline checkpoints with Pydantic models by luisorofino · Pull Request #24180 · DataDog/integrations-core

luisorofino · 2026-06-25T11:02:50Z

What does this PR do?

Replaces the untyped dict[str, Any] checkpoint storage with Pydantic models (SuccessCheckpoint, FailedCheckpoint, CheckpointTokenInfo), validated on read via a discriminated union on the status field.

Key changes:

CheckpointStatus is now a StrEnum instead of a plain string, eliminating isinstance(data, dict) guards scattered across the codebase
CheckpointManager.read() returns dict[str, PhaseCheckpoint] and raises CheckpointReadError on any invalid entry (previously silently ignored)
CheckpointManager.write_phase_checkpoint() now accepts a PhaseCheckpoint model instead of a raw dict
Call sites in base.py and agentic_phase.py construct typed models directly

Motivated by the review comment in #24164: #24164 (comment)

Note for reviewers: Several test files (test_base.py, test_agentic_phase.py, test_inspect_endpoint.py, test_orchestrator.py) have mechanical changes to use attribute access (checkpoint.status, checkpoint.tokens) instead of dict subscripting (checkpoint["status"]). Their assertions and coverage are unchanged.

Motivation

The untyped checkpoint dict made it impossible to know the shape of a checkpoint without reading all write sites. The isinstance(data, dict) guard in successful_phases() was a symptom of this. Pydantic models make the schema explicit, catch corruption early, and let callers use attribute access instead of string keys.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add qa/required if this PR needs QA validation, or qa/skip-qa if it does not. Exactly one of the two is required.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1df4318d92

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-25T11:09:27Z

+                result[phase_id] = _CHECKPOINT_ADAPTER.validate_python(data)
+            except ValidationError as e:
+                raise CheckpointReadError(f"Checkpoint for phase {phase_id!r} in {self._path} is invalid: {e}") from e


Keep fresh runs from validating stale checkpoints

Because Phase.process_message unconditionally calls self._checkpoint_manager.read() when building the phase context, this validation now runs even when the orchestrator is started without resume. A valid-YAML but schema-invalid checkpoint left by an older/interrupted/manual run, such as a status: success entry missing the new tokens or memory_path fields, will abort a fresh run before it can overwrite the file, despite non-resume runs registering every phase from scratch. Consider restricting strict validation to resume paths or otherwise tolerating stale entries for fresh runs.

Useful? React with 👍 / 👎.

dd-octo-sts · 2026-06-25T11:17:33Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and code coverage settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

datadog-datadog-prod-us1-2 · 2026-06-25T11:39:57Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

PR All | test / j06ca546 / SNMP

🧪 1 Test failed in 1 job

PR All | run

All test failures are known flaky — job may pass on retry.

❄️ Known flaky: test_e2e_snmp_listener from test_e2e_snmp_listener.py

Needed at least 1 candidates for &#39;datadog.snmp.check_duration&#39;, got 0
Expected:
        MetricStub(name=&#39;datadog.snmp.check_duration&#39;, type=0, value=None, tags=[&#39;autodiscovery_subnet:172.18.0.0/28&#39;, &#39;device_vendor:apc&#39;, &#39;firmware_version:2.0.3-test&#39;, &#39;loader:python&#39;, &#39;model:APC Smart-UPS 600&#39;, &#39;serial_num:test_serial&#39;, &#39;snmp_device:172.18.0.1&#39;, &#39;snmp_profile:apc_ups&#39;, &#39;ups_name:testIdentName&#39;], hostname=None, device=None, flush_first_value=None)
Difference to closest:
        Expected tag snmp_device:172.18.0.1
        Found snmp_device:172.18.0.2

Similar submitted:
Score   Most similar
1.00    MetricStub(name=&#39;datadog.snmp.check_duration&#39;, type=0, value=0.21761059761047363, tags=[&#39;autodiscovery_subnet:172.18.0.0/28&#39;, &#39;device_vendor:apc&#39;, &#39;firmware_version:2.0.3-test&#39;, &#39;loader:python&#39;, &#39;model:APC Smart-UPS 600&#39;, &#39;serial_num:test_serial&#39;, &#39;snmp_device:172.18.0.2&#39;, &#39;snmp_profile:apc_ups&#39;, &#39;ups_name:testIdentName&#39;], hostname=&#39;runnervm08nci&#39;, device=None, flush_first_value=False)
...

Not introduced in this PR.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🎯 Code Coverage (details)
• Patch Coverage: 100.00%
• Overall Coverage: 88.46% (+0.00%)

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: b4e393a | Docs | Datadog PR Page | Give us feedback!}

Add checkpoint validation from Pydantic

1df4318

luisorofino added the qa/skip-qa Automatically skip this PR for the next QA label Jun 25, 2026

luisorofino requested a review from a team as a code owner June 25, 2026 11:02

dd-octo-sts Bot added the ddev label Jun 25, 2026

luisorofino marked this pull request as draft June 25, 2026 11:03

dd-octo-sts Bot added the team/agent-integrations label Jun 25, 2026

luisorofino changed the title ~~Add checkpoint validation from Pydantic~~ Validate pipeline checkpoints with Pydantic models Jun 25, 2026

chatgpt-codex-connector Bot reviewed Jun 25, 2026

View reviewed changes

Rename TokenUsage to CheckpointTokenInfo

b4e393a

luisorofino marked this pull request as ready for review June 25, 2026 11:16

luisorofino mentioned this pull request Jun 25, 2026

Add resume support to the AI flow orchestrator #24164

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate pipeline checkpoints with Pydantic models#24180

Validate pipeline checkpoints with Pydantic models#24180
luisorofino wants to merge 2 commits into
loa/orchestrator-resumefrom
loa/checkpoints-pydantic-validation

luisorofino commented Jun 25, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Uh oh!

dd-octo-sts Bot commented Jun 25, 2026

Uh oh!

datadog-datadog-prod-us1-2 Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

luisorofino commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

dd-octo-sts Bot commented Jun 25, 2026

Validation Report

Uh oh!

datadog-datadog-prod-us1-2 Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

luisorofino commented Jun 25, 2026 •

edited

Loading

datadog-datadog-prod-us1-2 Bot commented Jun 25, 2026 •

edited

Loading