Skip to content

fix(serialization): DagParam in partial() of mapped task serializes stably#69091

Open
gingeekrishna wants to merge 1 commit into
apache:mainfrom
gingeekrishna:fix/68941-dagparam-serialization-in-partial
Open

fix(serialization): DagParam in partial() of mapped task serializes stably#69091
gingeekrishna wants to merge 1 commit into
apache:mainfrom
gingeekrishna:fix/68941-dagparam-serialization-in-partial

Conversation

@gingeekrishna

Copy link
Copy Markdown

Summary

Fixes #68941

Problem

When DagParam is used as a kwarg in .partial() of a dynamically mapped task, BaseSerialization.serialize() had no branch for DagParam and fell through to cls.default_serialization(strict, var) which calls str(var). Because str(DagParam) includes the object's memory address (e.g. <DagParam object at 0x7f3a...>), the serialized partial_kwargs changed on every scheduler parse, causing a new DAG version to be written on each cycle — DAG version inflation.

Reproducer:

from airflow.sdk import DAG, task

with DAG(dag_id=repro, schedule=None) as dag:
    @task
    def add(value):
        return value

    add.partial(value=dag.param(p, default)).expand(value=[1, 2, 3])

Fix

  1. enums.py — Add DAT.DAG_PARAM = dag_param to the type dispatch enum.
  2. serialized_objects.py:
    • Add DagParam to the imports.
    • Add a _DagParamRef NamedTuple (modelled on _XComRef) that acts as a stable placeholder during deserialization, holding dag_id, name, and default until the DAG is available.
    • Add a DagParam branch in BaseSerialization.serialize() that encodes the stable triple (dag_id, name, default) using the new DAT.DAG_PARAM type.
    • Add a DAT.DAG_PARAM case in BaseSerialization.deserialize() that returns a _DagParamRef placeholder.
    • Add OperatorSerialization._resolve_dag_param_refs() to recursively resolve placeholders once the DAG is hydrated.
    • Update set_task_dag_references() to call _resolve_dag_param_refs on partial_kwargs for MappedOperator.

Why _DagParamRef instead of deserializing immediately?

DagParam.__init__ requires a live DAG object (current_dag.params[name] = default) but partial_kwargs are deserialized before the full DAG is assembled. This is the same deferred-resolution pattern already used by _XComRef for XComArg.

Tests

Four regression tests added to test_dag_serialization.py:

Test What it verifies
test_dagparam_in_partial_is_serialized_stably No memory address in serialized partial_kwargs; dag_param encoding contains correct name/dag_id/default; _DagParamRef placeholder created on individual-op deserialization
test_dagparam_in_partial_roundtrip Full DagSerialization.to_dict / from_dict cycle produces a live DagParam with correct _name, _default, and current_dag reference
test_dagparam_in_partial_version_stability Two serializations of the same DAG produce identical output (no inflation)
test_dagparam_in_partial_no_default DagParam with no default (NOTSET) round-trips correctly

…tably

Fixes apache#68941

When DagParam is used as a kwarg in partial() of a dynamically mapped
task, BaseSerialization.serialize() had no branch for DagParam and fell
through to str(var), embedding a non-deterministic memory address in
partial_kwargs. This caused a new DAG version to be written on every
scheduler parse cycle.

Add DAT.DAG_PARAM to the type enum and handle DagParam in both
serialize() and deserialize(). During deserialization a _DagParamRef
placeholder (similar to _XComRef for XComArg) stores the stable
dag_id/name/default triple until set_task_dag_references() can attach
the live DAG reference once the DAG is fully hydrated.

Four regression tests cover: stable serialization (no memory address),
full round-trip (DagParam reconstructed with correct dag reference),
version stability (two serializations of same DAG are identical), and
NOTSET default preservation.
@gingeekrishna gingeekrishna requested a review from ashb as a code owner June 27, 2026 19:34
Copilot AI review requested due to automatic review settings June 27, 2026 19:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dag Version Inflation: DagParam serialized with memory address if used in partial of a mapped task

2 participants