Skip to content

Add workflow DAG reporting to nf-tower plugin#7218

Open
ewels wants to merge 3 commits into
masterfrom
dag-reporting-nf-tower
Open

Add workflow DAG reporting to nf-tower plugin#7218
ewels wants to merge 3 commits into
masterfrom
dag-reporting-nf-tower

Conversation

@ewels

@ewels ewels commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Serialize the run execution DAG faithfully and send it to Seqera Platform once at run start (on the begin request), so Platform (or any consumer) can render a live execution graph.

The plugin's responsibility is deliberately minimal and stable: it serializes session.getDag() and sends it as-is — no reduction, filtering or reinterpretation. All interpretation (collapsing operator/channel vertices to a process-only graph, building a subworkflow hierarchy, choosing a layout) happens consumer-side, where it can evolve without a Nextflow release. The DAG is small (KB-scale even for large pipelines), so it's sent whole.

What's in here

New core serializermodules/nextflow/src/main/groovy/nextflow/dag/DagSerializer.groovy

  • static Map toMap(DAG dag) mirrors the DAG model 1:1:
    {
      "dag": {
        "vertices": [ { "id": "<stable id>", "type": "PROCESS", "label": "<vertex label>", "scope": "<enclosing workflow>" } ],
        "edges":    [ { "source": "<id>", "target": "<id>", "label": "<optional channel label>" } ]
      }
    }
  • Every vertex type (PROCESS/OPERATOR/ORIGIN/NODE) and every edge is emitted — operator/channel vertices between processes are preserved, never dropped.
  • It lives in the core nextflow.dag package alongside the existing DOT/GEXF/Mermaid renderers because DAG.Type, addVertex and createVertex are @PackageScope — placing it there keeps it @CompileStatic-clean and lets the faithful unit test construct PROCESS+OPERATOR vertices. nf-tower just invokes it.

TowerObserver — adds renderDag(session) (null-safe; swallows errors so DAG reporting can never abort a run) and wires the result into makeBeginReq as an optional, additive top-level dag field (omitted when empty/unavailable). No existing fields changed.

Key decisions

  • Lifecycle point: onFlowBegin. Verified via ScriptRunner.run(): scriptLoader.runScript() builds the entire dataflow network (and thus the DAG) before session.fireDataflowNetwork()notifyFlowBegin(). CH.broadcast(), which runs after onFlowBegin, uses GPars' native queue.into()/MixOp and adds no NodeMarker DAG vertices — so the DAG is structurally complete at begin. Sent exactly once.
  • FQ process name source: Vertex.label directly. When a process runs inside a subworkflow, BindableDef clones it with prefix + SCOPE_SEP + name, so process.name (stored as Vertex.label) is already the fully-qualified name (e.g. RNASEQ:ALIGN:STAR). We also emit scope (Vertex.workflow, the enclosing subworkflow) so the consumer never has to guess. The stable edge join uses Vertex.id (the AtomicLong, stable across normalize() unlike the positional v0/v1 name).
  • normalize() before serializing — resolves channel-derived edge labels and fills ORIGIN/NODE boundary vertices, matching the file renderers. Idempotent, so it's safe alongside GraphObserver's own normalize() at completion.

Backwards compatibility

The dag field is optional and additive. Older Platform/consumers ignore the unknown field, and the plugin never fails if the server doesn't accept it. No existing fields changed.

Tests

./gradlew :plugins:nf-tower:test and :nextflow:test — all green.

  • DagSerializerTest (core): faithful vertex/edge serialization with an operator between two processes (asserts it's preserved), FQ name + scope exposure, boundary-vertex fill-in, null/empty handling.
  • TowerObserverTest: dag present in begin request, omitted when null, renderDag empty/exception null-safety, populated-DAG operator-chain check.
  • TowerJsonGeneratorTest: the dag map round-trips through the real wire generator with operators and FQ labels/scope intact.

E2E: ran a pipeline with a named subworkflow (SUB) via ./launch.sh … -with-dag — the DAG showed FQ names SUB:FOO/SUB:BAR, the intervening map operator, boundary vertices, and labeled edges (ch, collected) — exactly the data DagSerializer emits.

🤖 Generated with Claude Code

Serialize the run execution DAG faithfully and send it to Seqera Platform
once at run start, so consumers can render a live execution graph.

A new `nextflow.dag.DagSerializer` mirrors the DAG model one-to-one (every
vertex and edge, no filtering or reduction) into a plain map with the shape
`{vertices:[{id,type,label,scope}], edges:[{source,target,label}]}`. It lives
in the core `nextflow.dag` package alongside the existing DOT/GEXF/Mermaid
renderers, where `DAG.Type`/`addVertex`/`createVertex` (package-scoped) are
accessible. For PROCESS vertices `label` is the fully-qualified process name
(scope prefix included) and `scope` is the enclosing subworkflow, so the FQ
name stays reconstructable. Edges join via the stable per-vertex id.

`TowerObserver` serializes the DAG in `makeBeginReq` as an optional, additive
`dag` field on the begin payload. The DAG is complete at `onFlowBegin`: the
script body builds the full dataflow network before `fireDataflowNetwork`
fires the begin event, and `CH.broadcast()` (which runs afterwards) adds no
DAG vertices. Serialization is null-safe and swallows errors so DAG reporting
can never disrupt a run; older consumers ignore the unknown field.

Note: the consumer side (accepting `dag` on begin, persisting it, reducing and
serving it for the graph view) is a separate seqeralabs/platform PR; the two
should land together.

Signed-off-by: Phil Ewels <phil.ewels@seqera.io>
@netlify

netlify Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 178aabe
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a2aa810c7bc330008086390
😎 Deploy Preview https://deploy-preview-7218--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@pditommaso pditommaso left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea — sending the DAG faithfully at run start and leaving all interpretation to the consumer is the right call, and the design claims hold up against the code. I verified the key points locally:

  • DAG complete at onFlowBegin — confirmed: scriptLoader.runScript() builds the full dataflow network before session.fireDataflowNetwork(), and notifyFlowBegin() fires first there. The subsequent CH.broadcast() (queue bridging and topic MixOp) never calls NodeMarker.addOperatorNode, so no vertices are added after the snapshot. Serialization also happens synchronously on the main thread before igniters run, so no concurrent-mutation risk.
  • normalize() idempotency — confirmed, safe alongside GraphObserver's own normalize() at completion.
  • Wire formatTowerJsonGenerator's scheme map only truncates listed paths, it's not a whitelist, so the dag field passes through as-is.
  • Tests — ran DagSerializerTest, TowerObserverTest and TowerJsonGeneratorTest on this branch, all green. Coverage is genuinely good: the operator-preservation property is asserted at three layers (core serializer, observer, real wire generator round-trip).

To address

  • renderDag catches Exception, but the DAG invariant throws AssertionError. DAG.normalizeMissingVertices() enforces its invariant with a Groovy assert (DAG.groovy:314), which throws AssertionError — an Error, not an Exception — so it would escape the catch and abort the run, contradicting the stated "can never abort a run" contract. Unlikely in practice, but worth catching Throwable (or Exception | AssertionError) to make the contract bulletproof.

Nice to have (non-blocking)

  • "Stable id" wording is slightly misleading. Vertex.nextID is a static JVM-global AtomicLong — ids aren't deterministic across runs and won't match between a run and its -resume. They're only stable within one payload, which is sufficient for edge joins, but a one-word doc tweak ("payload-scoped id") would prevent a consumer treating them as cross-run identity.
  • DagSerializer.toMap() mutates its argument via normalize() — a side effect in a method named toMap. Documented and idempotent, so acceptable, but a serialize(DAG) name or an explicit note would make the mutation harder to miss.
  • Raw LinkedHashMap/Map types in DagSerializerMap<String,Object> would be cleaner under @CompileStatic.
  • entry.scope = v.workflow ?: null deliberately converts the empty-string default from createVertex to null — correct, but a short comment would explain the ?:.

…docs

- renderDag now catches Throwable, not just Exception: DAG.normalize()
  enforces its invariant with a Groovy assert (an AssertionError), so the
  'DAG reporting can never abort a run' contract now holds for that case too.
- Clarify that vertex ids are payload-scoped (static JVM-global counter,
  not deterministic across runs or a -resume), not cross-run stable.
- Document the in-place normalize() side effect of DagSerializer.toMap().
- Parameterize raw Map/LinkedHashMap types as Map<String,Object>.
- Comment the empty-string-to-null scope coercion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Phil Ewels <phil.ewels@seqera.io>
@ewels

ewels commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Thanks for the thorough review — all points addressed in f65bf8e.

To address

  • renderDag now catches Throwable. You're right that DAG.normalize()normalizeMissingVertices() enforces its invariant with a Groovy assert, which throws AssertionError (an Error, not an Exception) and would have escaped the old catch(Exception). Switched to catch(Throwable) with a comment explaining why, so the "DAG reporting can never abort a run" contract holds even in that case.

Nice to have

  • "Stable id" wording. Reworded the javadoc to "payload-scoped per-vertex id" and added a note that it comes from a static JVM-global AtomicLong, so it's not deterministic across runs and won't match between a run and its -resume — identity only within a single payload.
  • toMap() side effect. Added a Side effect note in the javadoc spelling out the in-place normalize() mutation and that idempotency keeps it safe alongside GraphObserver. Kept the toMap name (renaming touches the call site plus three test layers) since the documented-note option was acceptable.
  • Raw types. toMap now returns Map<String,Object> and the internal collections are parameterized too.
  • Empty-string scope coercion. Added a comment explaining that createVertex defaults workflow to "" and the ?: coerces that root scope back to null.

DagSerializerTest, TowerObserverTest and TowerJsonGeneratorTest all still green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Phil Ewels <phil.ewels@seqera.io>
@ewels ewels requested a review from a team as a code owner June 11, 2026 12:20

@bentsherman bentsherman left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Nextflow should just send a map of process name -> upstream processes. Everything else (operators, channel names, etc) are internal details that are not appropriate for exposure via API. That is, it limits our ability to change the DAG structure because external systems like platform are now dependent on it

@ewels

ewels commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Hah, I started off with an implementation like that and then rolled it back to be generic instead. Basically the same logic, but in reverse - I was thinking if we just sent the whole DAG then Nextflow could be agnostic and we would be better future proofed for anything we want to do without needing further Nextflow changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants