Skip to content

ADR: Remote pipeline inclusion#7213

Open
bentsherman wants to merge 5 commits into
masterfrom
adr-meta-pipelines
Open

ADR: Remote pipeline inclusion#7213
bentsherman wants to merge 5 commits into
masterfrom
adr-meta-pipelines

Conversation

@bentsherman

Copy link
Copy Markdown
Member

This PR adds an ADR for remote pipeline inclusion, aka "meta-pipelines".

It describes an approach for including remote pipelines into a meta-pipeline in a way that preserves dataflow concurrency between pipeline inputs/outputs.

It discusses alternative approaches such as pipeline chaining / nf-cascade and why they don't satisfy certain use cases (preserving dataflow concurrency).

It also walks through a basic example of fetchngs -> rnaseq.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman bentsherman requested review from ewels and pditommaso June 10, 2026 00:19
@netlify

netlify Bot commented Jun 10, 2026

Copy link
Copy Markdown

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 09c9e96
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/6a2ae8fb06f1bf00081bb8b5
😎 Deploy Preview https://deploy-preview-7213--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@bentsherman bentsherman added this to the 26.10 milestone Jun 10, 2026
@ewels

ewels commented Jun 10, 2026

Copy link
Copy Markdown
Member

Great write up, thanks for this Ben!

As you might expect, I'm most concerned about the params. You characterise it as a one-off cost which is mitigated by LLMs, however that doesn't take into account updates to included pipelines (a core functionality with included modules). The params drift with updates would be dangerous and a constant source of dev work.

I'd still love to look into how we could bulk import nested config and apply it at root level. Even if it is a separate import + apply mechanism (eg. like config profiles in a sense?). I think without it, the use of the meta pipeline functionality is substantially limited.

Comment on lines +58 to +61
3. No use of project-level assets (`projectDir`, `bin`, `lib`) within the core workflow. Module-level assets can be used through the module `resources/` bundle and `moduleDir`.
4. Declare software dependencies (`container`, `conda`) in the process definition, not in config.
5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.
6. No plugin functions within the core workflow.

@jorgee jorgee Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clear about some of these best practices and what's the issue of not following them; maybe could be good to add an example.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following these guidelines makes it so that when you include the core workflow and its dependent modules/subworkflows, it is self-contained

For example:

  • if the core workflow uses project-level assets like bin or lib, I have to remember to copy them into the meta-pipeline
  • if the core workflow uses a param directly and I import that into the meta-pipeline, I have to remember to define the same param (with the same meaning) in the meta-pipeline
  • and so on

> results/output-rnaseq.json
```

While pipeline chaining has always been possible in theory, new language features such as [workflow outputs](20251020-workflow-outputs.md) and [record types](20260306-record-types.md) make it much more practical. Each pipeline can define a structured output which can be passed to the next pipeline via JSON. Mismatches between an upstream output and downstream input (e.g. missing columns, different column names) can be resolved by a small adapter pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain to be convinced of the point of pipeline chaining if we can trivially make meta pipelines.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think pipeline chaining is used because metapipelines don't work right now. If they did, the number of pipeline chains drops.

That's not to say they're never useful, but it's much less common.

Two main use cases:

  • Run major pipeline (sarek, rnaseq) and add a few auxiliary processes
  • Daisy chain two pipelines (fetchngs -> rnaseq)

Both are solved better by metapipelines than pipeline chaining.

The main use case for daisy chaining is actually wiring nextflow up to non Nextflow tools, e.g. Nextflow into an ETL system. In this case structured inputs and outputs are still very useful.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point the value prop of pipeline chaining appears to be low development overhead (just plug A into B)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, chaining has development overhead, it's quite a faff, all we have to do is bring meta-pipeline dev under that faff level

Comment thread adr/20260608-remote-pipeline-inclusion.md
@pinin4fjords

Copy link
Copy Markdown
Contributor

As you might expect, I'm most concerned about the params.

Agreed. Feel like we need some sort of auto-import of the params of child workflows, so e.g. they appear automatically in Platform, and I could say e.g. meta.rnaseq.pseudoaligner = 'kallisto' in the meta pipeline's nextflow.config to override.

Then some auto-assembly of docs as well.

Basically we need to standardise at the nextflow level where a bunch of the non-nextflow pieces need to live.

3. No use of project-level assets (`projectDir`, `bin`, `lib`) within the core workflow. Module-level assets can be used through the module `resources/` bundle and `moduleDir`.
4. Declare software dependencies (`container`, `conda`) in the process definition, not in config.
5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.
6. No plugin functions within the core workflow.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin support feels like a requirement, functions like a webhook or logging statement could be critical for the workflow. The main challenge might be supporting multiple versions (e.g. WORKFLOW1 uses plugin@1.2.3 and WORKFLOW2 uses plugin@2.4.1), but maybe we can just say "ONE PLUGIN ONLY"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins for webhooks / logging typically live outside the core workflow. so the meta-pipeline would just import the core workflow logic and decide whether to include those plugins in its own shell

I have yet to see a plugin that is actually used in a workflow's core logic, although it's certainly possible. Most plugins provide third-party integrations at the pipeline boundary

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but this might become more popular with the plugin registry + vibe coding.

Sounds like premature optimization by me, easier to just tell people to be careful and deal with it if it's a problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's why I call them out as best practices instead of hard rules. you can use a plugin function as long as you remember to declare it in the meta-pipeline config

2. No `publishDir` -- use the `output` block.
3. No use of project-level assets (`projectDir`, `bin`, `lib`) within the core workflow. Module-level assets can be used through the module `resources/` bundle and `moduleDir`.
4. Declare software dependencies (`container`, `conda`) in the process definition, not in config.
5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ext.args is soooooo powerful, yet clearly breaks the interface contract for processes.

I still think we should promote args to a directive and it will solve a number of these issues (process.args) 😉 .

process {
    args "--concise"
    // etc...
}

// main.nf
my_process(ch_inputs, args: "--verbose")

// nextflow.config
process.withName 'my_process' {
    args = "--verbose"
}

@bentsherman bentsherman Jun 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both ext.args and process.args can work, as long as the default value for the arg is defined in the process definition rather than in config

the core problem is that when I import a workflow, Nextflow doesn't know which config is "tied" to that workflow

5. No default `ext` settings in config -- specify these defaults in the process definition or use explicit process inputs. Otherwise, any default `ext` settings must be replicated manually in the meta-pipeline.
6. No plugin functions within the core workflow.

For process directives, it is helpful to distinguish *what* is executed vs *how* it is executed. Directives that affect the *what* (`container`, `ext` settings) should be owned by the process definition. Directives that affect the *how* (`cpus`, `memory`, `executor`, `queue`, `errorStrategy`) should be owned by the meta-pipeline.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the distinction here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other words, some directives affect the task result while others don't


Alternatively, these core plugin dependencies could be specified in the pipeline spec under `requires.plugins`. When installing a pipeline, Nextflow could copy these plugin declarations into the meta-pipeline config and/or spec.

Since this use case is rare -- plugin functions are typically used in the entry workflow outside the core workflow -- it can be deferred in the first iteration.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With more private plugin registries, I expect more utility methods in plugins (e.g. updateLims(sampleId, status)), but maybe this is premature optimization.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A LIMS integration sounds like something that could live outside the core workflow

> results/output-rnaseq.json
```

While pipeline chaining has always been possible in theory, new language features such as [workflow outputs](20251020-workflow-outputs.md) and [record types](20260306-record-types.md) make it much more practical. Each pipeline can define a structured output which can be passed to the next pipeline via JSON. Mismatches between an upstream output and downstream input (e.g. missing columns, different column names) can be resolved by a small adapter pipeline.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think pipeline chaining is used because metapipelines don't work right now. If they did, the number of pipeline chains drops.

That's not to say they're never useful, but it's much less common.

Two main use cases:

  • Run major pipeline (sarek, rnaseq) and add a few auxiliary processes
  • Daisy chain two pipelines (fetchngs -> rnaseq)

Both are solved better by metapipelines than pipeline chaining.

The main use case for daisy chaining is actually wiring nextflow up to non Nextflow tools, e.g. Nextflow into an ETL system. In this case structured inputs and outputs are still very useful.


The Nextflow-in-Nextflow approach treats the included pipeline as a *black box* -- it preserves the exact pipeline behavior (core workflow + entry workflow + config) while forfeiting dataflow composition (separate dataflow graphs).

An ideal solution might combine the best of both: compose pipelines into a single dataflow graph (white box) while inheriting each pipeline's params, outputs, and config so they need not be replicated (black box). We considered such a model, where an included pipeline contributes its shell as namespaced, overridable defaults, but rejected it. Dataflow composition fundamentally requires exposing the core workflow as a set of channel ports, so the white-box mechanism is unavoidable; inheritance would only layer implicit behavior on top of it. That behavior comes at a steep cost: it relocates a one-time *write* cost (boilerplate) into a recurring *read* cost (hidden defaults, auto-bound arguments, auto-published outputs), burdens every tool that must now understand it (linter, type checker, config resolution, resume), and conflicts with the frozen-island philosophy that otherwise governs vendored code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. The added complexity is enormous.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewels @pinin4fjords @adamrtalbot

Pulling everyone into this thread to talk about auto-inheritance

As you might expect, I'm most concerned about the params. You characterise it as a one-off cost which is mitigated by LLMs, however that doesn't take into account updates to included pipelines (a core functionality with included modules). The params drift with updates would be dangerous and a constant source of dev work.

That's fair, but not my main point. The core problem is this -- if you want to preserve dataflow concurrency between pipelines, then you can't really just auto-import params into the meta-pipeline. You have to define which params are replaced with inter-pipeline wiring vs exposed to the top-level. That amounts to just writing the meta-workflow.

The development overhead is what it is. I suggest the AI skill just as an idea. I'm sure it could also handle updates. All of that is better than having loads of hidden behavior that makes the meta-pipeline impossible to reason about

I'd still love to look into how we could bulk import nested config and apply it at root level. Even if it is a separate import + apply mechanism (eg. like config profiles in a sense?). I think without it, the use of the meta pipeline functionality is substantially limited.

Not sure I understand this point. Most of the config is just standard boilerplate, so it doesn't make sense to auto-import it because you will just get lots of duplicate config

Unless you are talking about ext config. That will depend on whether we can move the default ext settings into the process definition

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on what Adam said:

In a scenario where I update my workflow from v1.1 to v1.2, an update to params should be explicit in the input block, not implicit and I hope it doesn't change too much.

The nice thing about an explicit meta-pipeline definition is that when I update the included pipeline, the linter / language server will immediately pick up on any inconsistencies, because it's just regular code. I'm not sure the tooling would be able to do that if there was a lot of implicit behavior

}
// perform RNAseq analysis
multiqc_report = NFCORE_RNASEQ( ch_samples )

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note - I would remove MultiQC from all nf-core pipelines and put them in the metapipelines, i.e. no MultiQC repeats, but that's a matter of opinion.

FETCHNGS(ch_inputs)
RNASEQ(fetchngs.out)
MULTIQC(RNASEQ.out.qc_files)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that. Wasn't sure if you would want a meta-pipeline to produce one multiqc report per pipeline or just one for the whole thing

Comment thread adr/20260608-remote-pipeline-inclusion.md Outdated
@adamrtalbot

adamrtalbot commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

As you might expect, I'm most concerned about the params.

Agreed. Feel like we need some sort of auto-import of the params of child workflows, so e.g. they appear automatically in Platform, and I could say e.g. meta.rnaseq.pseudoaligner = 'kallisto' in the meta pipeline's nextflow.config to override.

Then some auto-assembly of docs as well.

Basically we need to standardise at the nextflow level where a bunch of the non-nextflow pieces need to live.

I disagree. Having unpredictable global scope params blocks is just weird and if we were designed Nextflow today we would never include this behaviour. In other languages, globals need to be used with caution and are generally not advised. Having random params.foo.bar.baz with no way of validating or checking is just "something you have to know", instead of being clear to the author.

In a scenario where I update my workflow from v1.1 to v1.2, an update to params should be explicit in the input block, not implicit and I hope it doesn't change too much.

If we really want to make them importable, we could add a dedicated params block to the workflow definition:

workflow THING {
   params:
        foo: Int
        bar: Bool
        baz: String

   take:
   // etc
}

but this doesn't feel very different to:

record ThingParams {
    foo: Int
    bar: Bool
    baz: String
}

workflow THING {
   take:
       params: ThingParams

   // etc
}

@adamrtalbot

Copy link
Copy Markdown
Collaborator

My main concern here is versioning of imported workflows. Do we include a lock file or something to ensure consistency or just trust in the files that are copied into the workflow code?

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>

When a pipeline is included, it is vendored into the meta-pipeline project under `workflows/<scope>/<name>/`. Included pipelines are isolated -- each included pipeline has its own `modules/` and `workflows/` directories. This way, two pipelines can use different versions of the same module without compromising reproducibility.

Included pipelines should be committed to the meta-pipeline repository. The pipeline should have a *pipeline spec* (`nextflow_spec.json`) which specifies the pipeline version, so that Nextflow can track local changes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamrtalbot

My main concern here is versioning of imported workflows. Do we include a lock file or something to ensure consistency or just trust in the files that are copied into the workflow code?

See here. Like modules, we will likely want to have some sort of checksum verification (e.g. .pipeline-info)

I guess the simplest way would be to commit the entire pipeline, even though only the core workflow will be used. Then you can have a single checksum for the entire pipeline directory

It's probably still useful to keep the pipeline shell in the meta-pipeline repo, since e.g. your agent will want to refer to it when updating the meta-pipeline

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nf-core copy+pastes modules for subworkflows and it works well!

Comment on lines +282 to +284
```groovy
include { NFCORE_FETCHNGS } from 'nf-core/fetchngs'
include { NFCORE_RNASEQ } from 'nf-core/rnaseq'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone feeling adventurous, here is what Claude and I came up with while exploring auto-inheritance:

include { NFCORE_FETCHNGS } from 'nf-core/fetchngs'
include { NFCORE_RNASEQ } from 'nf-core/rnaseq'

params {
    input: Path // meta entry point
    strandedness: String = 'auto' // one new knob
    // aligner / fasta / ... inherited from rnaseq's params, override on CLI as --rnaseq.fasta=...
}

workflow {
    main:
    ch_ids = channel.fromPath(params.input).splitCsv()
    ch_samples = NFCORE_FETCHNGS( ch_ids )

    ch_samples = samples.map { r -> r + record(strandedness: params.strandedness) }

    // rnaseq.* params automagically passed to rnaseq workflow via named arguments
    NFCORE_RNASEQ( samples: ch_samples )

    // no publish/output blocks: each pipeline's outputs publish under <output-dir>/<pipeline>/
    // question: what if I don't want to publish something (e.g. fetchngs output)?
}

Feel free to take it and run with it...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly the way I was thinking. We just namespace the children's params

Signed-off-by: Ben Sherman <bentshermann@gmail.com>

@pditommaso pditommaso left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, Ben — the dataflow-composition motivation and the rejection of the runtime-inheritance hybrid are both nicely argued. A few thoughts to share before this moves past draft:

1. The key technical challenge could be expanded. At its core this proposes a mechanism to include a fully-fledged Nextflow workflow into another, mimicking how we already include modules and sub-workflows. The part I'd love to see fleshed out is how channels and values get bound into the included workflow's inputs. The ADR sets out the policy (params live at the top level, the core workflow consumes everything via take:) but doesn't yet describe the binding mechanics: how a scalar value vs. a streaming channel is bound at the call site, the value-channel/queue-channel broadcast semantics, and whether a typed take: can accept a bare value type like String/Path. The example here (take: aligner: String) also reads a bit differently from the typed-workflows ADR, where every take: input is a channel type. Since this binding question largely determines feasibility, it'd be great to work it out explicitly.

2. The nomenclature can be better shaped. The document moves between "meta-pipeline" and "remote pipeline", and I think the framing could be sharpened. Terms like workflow modularisation / workflow inclusion / workflow composition might describe what's happening (composing one workflow into another) more directly than introducing a new "meta-pipeline" category.

3. There's some overlap with existing sub-workflow inclusion. Once you discard the entry workflow, params, and output block and import only the core workflow, what's left looks a lot like a sub-workflow. It'd be helpful to clarify how this differs from including a remote sub-workflow, and what the main benefit is that justifies a separate mechanism (separate storage layout, a new nextflow_spec.json, a separate CLI, etc.).

4. A possible framing. I'd lean toward framing the next step as enabling remote sub-workflows — the natural progression after remote modules (processes). Module (process) → sub-workflow → composition feels like a clean, incremental story that reuses the conventions we already have, rather than introducing a "pipeline" as a new top-level artifact with its own resolution rules, storage path, and spec file. If we get remote sub-workflow inclusion right, "meta-pipelines" might largely fall out of it as a usage pattern rather than a new concept.

@bentsherman

Copy link
Copy Markdown
Member Author

@pditommaso thanks for the review

The part I'd love to see fleshed out is how channels and values get bound into the included workflow's inputs.

There isn't much to say here because it just works like normal. In the appendix example, NFCORE_RNASEQ is just a named workflow. The meta-pipeline calls it the same way that rnaseq would call it. The only difference is that some inputs might come from upstream outputs instead of params.

... how a scalar value vs. a streaming channel is bound at the call site, the value-channel/queue-channel broadcast semantics, and whether a typed take: can accept a bare value type like String/Path.

A workflow take can be a channel, a dataflow value, or a regular value. This is how it has always worked

The document moves between "meta-pipeline" and "remote pipeline", and I think the framing could be sharpened. Terms like workflow modularisation / workflow inclusion / workflow composition might describe what's happening (composing one workflow into another) more directly than introducing a new "meta-pipeline" category.

"Meta-pipeline" is the top-line feature that everyone is after, but the only actual new feature proposed by the ADR is "remote pipeline inclusion" -- how to install a pipeline as a component and keep it in sync with the source. This is why the ADR is titled "Remote pipeline inclusion". Once you have that, everything else is just normal workflow composition and convention.

They are distinct concepts -- the ADR does not treat them as interchangeable.

Once you discard the entry workflow, params, and output block and import only the core workflow, what's left looks a lot like a sub-workflow. It'd be helpful to clarify how this differs from including a remote sub-workflow, and what the main benefit is that justifies a separate mechanism (separate storage layout, a new nextflow_spec.json, a separate CLI, etc.).

The core workflow looks like a subworkflow because it is a subworkflow 😄

The only new thing that we introduce here is installing a pipeline into a project as a component and keeping it in sync with the remote source (either from Git or the registry). For that you likely need a pipeline spec (version, checksum) and a CLI (installing, updating). I just haven't spelled all that out yet because the bigger question right now is how to minimize developer overhead

I'd lean toward framing the next step as enabling remote sub-workflows — the natural progression after remote modules (processes). Module (process) → sub-workflow → composition feels like a clean, incremental story that reuses the conventions we already have, rather than introducing a "pipeline" as a new top-level artifact with its own resolution rules, storage path, and spec file. If we get remote sub-workflow inclusion right, "meta-pipelines" might largely fall out of it as a usage pattern rather than a new concept.

Looks like you arrived at the same place as me. Remote workflows are the real feature, meta-pipelines emerge naturally as a convention on top.

I'm not sure whether it's worth trying to distinguish between pipelines / workflows / subworkflows. They're all basically the same thing. Especially if we add the ability to execute named workflows directly (#7208). The difference boils down to boilerplate, which we want to minimize anyway

This is why I just talk about "remote pipeline inclusion", because when I import a workflow, I don't really care whether that workflow is a "pipeline" like rnaseq or a "subworkflow" like BAM_STATS_SAMTOOLS. Workflow composition works the same way either way.

Happy to rename the ADR to "remote workflow inclusion" to align with the workflow keyword.

@ewels

ewels commented Jun 10, 2026

Copy link
Copy Markdown
Member

My main concern here is versioning of imported workflows. Do we include a lock file or something to ensure consistency or just trust in the files that are copied into the workflow code?

Modules have a .moduleinfo file with a hash to allow checking that stuff wasn't modified. I think I saw something similar mentioned here for pipelines / workflows?

@ewels

ewels commented Jun 10, 2026

Copy link
Copy Markdown
Member

Having unpredictable global scope params blocks is just weird and if we were designed Nextflow today we would never include this behaviour. In other languages, globals need to be used with caution and are generally not advised.

@adamrtalbot agreed, I never said global. I would love it if the pipeline config is imported within a dedicated scope and treated as a baseline default. Then the import-ing pipeline can override anything, but doesn't need to duplicate config that isn't being changed.

Doing this would not be trivial. The only way I can think of is to do something fairly radical like rendering the config at import time and saving that to a locked config file somewhere. Or some other crazy mechanism.

@adamrtalbot

Copy link
Copy Markdown
Collaborator

Having unpredictable global scope params blocks is just weird and if we were designed Nextflow today we would never include this behaviour. In other languages, globals need to be used with caution and are generally not advised.

@adamrtalbot agreed, I never said global. I would love it if the pipeline config is imported within a dedicated scope and treated as a baseline default. Then the import-ing pipeline can override anything, but doesn't need to duplicate config that isn't being changed.

Doing this would not be trivial. The only way I can think of is to do something fairly radical like rendering the config at import time and saving that to a locked config file somewhere. Or some other crazy mechanism.

Config or params? In my mind they are very different concepts, I was referring to parameters here.

@adamrtalbot

Copy link
Copy Markdown
Collaborator

Happy to rename the ADR to "remote workflow inclusion" to align with the workflow keyword.

I agree with this. They're all workflows*, the only thing that separates a "pipeline" from a subworkflow is perception.

*except the anonymous entry workflow, which is where the sticky point about params and config comes in 😉

@ewels

ewels commented Jun 11, 2026

Copy link
Copy Markdown
Member

Config or params? In my mind they are very different concepts, I was referring to parameters here.

Ideally params, but might need to be config for all the ext stuff..?

Happy to rename the ADR to "remote workflow inclusion" to align with the workflow keyword.

Yeah as it stands I think this basically boils down to the functionality we already have with nf-core subworkflows, right? Which is quite far from what I think of as meta-pipelines. Still good to have and useful..

@bentsherman

Copy link
Copy Markdown
Member Author

Yeah as it stands I think this basically boils down to the functionality we already have with nf-core subworkflows, right?

Can the nf-core tooling install a workflow from a pipeline repo? e.g. NFCORE_RNASEQ from nf-core/rnaseq? I think that is the main thing that this ADR adds

@bentsherman bentsherman changed the title ADR: Meta-pipelines ADR: Remote pipeline inclusion Jun 11, 2026
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Comment on lines +72 to +76
// module
include { BWA_MEM } from 'nf-core/bwa/mem'
// pipeline
include { NFCORE_RNASEQ } from 'nf-core/rnaseq'

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point that makes me hesitant to reframe the ADR as "remote workflow inclusion" -- here we are referencing the pipeline by name (nf-core/rnaseq)

It could be the GitHub repo or an entity in the Nextflow registry, but either way, the pipeline itself plays a role in facilitating the inclusion. Even if we only include the core workflow (NFCORE_RNASEQ), we likely need to store the entire pipeline code in the meta-pipeline repo, because that is the thing that is versioned

As a user, I will want to know that my meta-pipeline is using a specific pipeline version (e.g. nf-core/rnaseq 3.3.0), so in effect we have to say that we are including the entire pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants