Add documentation for NEMO Pulsar relay mode by dSizovs · Pull Request #82 · usegalaxy-eu/operations

dSizovs · 2026-06-11T08:54:05Z

Adds operations documentation for the bwForCluster NEMO Pulsar endpoint,
which uses pulsar-relay (HTTP) instead of AMQP.

bgruening · 2026-06-11T18:58:21Z

+  pending messages in **Valkey** (Redis-compatible) so an in-flight job is not
+  lost across a relay restart.
+- **Pulsar** runs on a NEMO login node. It long-polls the relay for new job
+  setup / status / kill messages, submits the actual work to **Slurm**, and


After it sends it back to the relay it also transfers data. Job input data to slurm, but also results back to Galaxy.

Good point, updated!

bgruening · 2026-06-11T18:59:01Z

+| Component | Host | Notes |
+|-----------|------|-------|
+| Galaxy runner `pulsar_eu_nemo` | usegalaxy.eu | Defined in `infrastructure-playbook` `job_conf.yml`; creds from vault |
+| TPV destination `pulsar_nemo_tpv` | usegalaxy.eu | Defined in `infrastructure-playbook` `tpv/destinations.yml.j2`; tag `nemo-pulsar` |


would it make sense to use real links here?

Linked to the infrastructure-playbook. Left repo-level. If it makes sense to dig exact job-conf and tpv/destinations.yml.j2 paths I'll link those directly.

bgruening · 2026-06-11T18:59:25Z

+|-----------|------|-------|
+| Galaxy runner `pulsar_eu_nemo` | usegalaxy.eu | Defined in `infrastructure-playbook` `job_conf.yml`; creds from vault |
+| TPV destination `pulsar_nemo_tpv` | usegalaxy.eu | Defined in `infrastructure-playbook` `tpv/destinations.yml.j2`; tag `nemo-pulsar` |
+| pulsar-relay | bw-cloud VM | systemd service `pulsar-relay`, listens on `:9000`, Valkey backend |


where is this deployment defined?

Added a link, deployed by pulsar-relay-role (now under usegalaxy-eu, yay🎉).

bgruening · 2026-06-11T19:00:12Z

+this happens today:
+
+1. **User-level opt-in**, a user selects the NEMO compute resource in
+   *User → Preferences → Manage Information → Use distributed compute


Also this can be a link.

bgruening · 2026-06-11T19:00:42Z

+1. **User-level opt-in**, a user selects the NEMO compute resource in
+   *User → Preferences → Manage Information → Use distributed compute
+   resources* ("Freiburg (Germany) - bwForCluster NEMO 2").
+2. **Per-user TPV rule**, an entry in `tpv/users.yml` that attaches the


there are multiple other ways, e.g. we could tag a specific tool to always go to Nemo

Added per tool routing as a third option, though I've only actually used the user opt-in and per-user rule myself.

bgruening · 2026-06-11T19:03:22Z

+
+## Pulsar (NEMO login node)
+
+NEMO does not provide user-level systemd, so Pulsar is kept alive by a small


Where is this wrapper script, how complex is it and should we maybe use supervisord instead?

It's in pulsar-nemo-login-role (templates/); a ~10 line while true loop, since NEMO has no user-level systemd. Added a note proposing supervisord as a cleaner replacement.

bgruening · 2026-06-11T19:04:07Z

+message_queue_url: http://<relay-host>:9000/
+message_queue_username: admin
+message_queue_password: <in vault>
+staging_directory: /home/.../pulsar/jobs_directory


Is this HOME dir configurable in the playbook?

Yes, it's the pulsar_nemo_home role variable (default ~/pulsar), so it's configurable per deployment. Noted that in the updated doc.

bgruening · 2026-06-11T19:05:07Z

+
+```bash
+# is Pulsar running?
+ps aux | grep pulsar-main | grep -v grep


we could then use supervisorctl here instead

Agree, flagged supervisord/supervisorctl as a future improvement.

bgruening · 2026-06-11T19:06:52Z

+**Job stuck in "queued"/"running" forever, but Slurm shows COMPLETED**
+Pulsar is submitting and the job finishes, but completion is not propagating
+back. Confirm the Slurm CLI status plugin maps the "job no longer in squeue"
+case to `complete`. (This was a real bug when Galaxy is importable in the same


I would you ever have Galaxy and puslar in the same env? Galaxy is not installed on the login node, isn't it?

That was my surprise too. I verified on NEMO. The reason Galaxy is importable: the deployment uses pulsar-galaxy-lib (0.15.14), which bundles the Galaxy libs (galaxy-schema, galaxy-data, galaxy-tool-util, …). So in slurm.py the try: from galaxy.model import Job succeeds and job_states becomes Galaxy's enum, inspect.getfile(job_states) -> galaxy/schema/schema.py, where OK.value == 'ok'. The stateful manager compares against status.COMPLETE == 'complete' (pulsar/managers/status.py), so 'ok' != 'complete' and the job never deactivates.
A clean pulsar-app install hits the ImportError fallback (OK = 'complete') and never sees this, which is why org/AU are fine. Fix is at galaxyproject/pulsar#460

bgruening · 2026-06-11T19:07:56Z

+
+**`No such transport: http`**
+The installed Pulsar version is routing the relay URL through the AMQP/kombu
+path. Use a Pulsar build with relay support that is compatible with the NEMO


which version is that, make sure you are running puslar >= x.x

Known good is 0.15.15.dev0 on Python 3.9, noted in the troubleshooting section and pinned in the login role.

Add documentation for NEMO Pulsar relay mode

11f2fd9

bgruening reviewed Jun 12, 2026

View reviewed changes

Revise NEMO Pulsar relay documentation

f0f7323


		## Pulsar (NEMO login node)

		NEMO does not provide user-level systemd, so Pulsar is kept alive by a small

Conversation

dSizovs commented Jun 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants