Update gadi config to reduce wasted compute by Mitchob · Pull Request #1 · AustralianBioCommons/sbp-workflow-configs

Mitchob · 2026-05-28T23:19:54Z

Summary

Resource optimisation for SBP_gadi.config based on benchmarking of nf-core/proteinfold
processes (AlphaFold2, ColabFold, Boltz) on NCI Gadi, informed by 10 test runs using T1024 as a usecase. Results available on googledrive. MSA was the process with most space to edit and reduce costs

Following TL suggestion I aimed to prioritise SU based on NCIs costing. Its possible that some steps might get a speed up by reducing CPU requests but we would still be charged at the same rate. Useful calc from sih.

Results

MMSEQS_COLABFOLDSEARCH (cpus 28 memory 256GB)

Previous settings with CPU/mem <20CPU caused OOM failure; MMseqs2 prefilter allocates memory relative to
available RAM and kills child workers when constrained
Benchmarking showed the full normalbw node (28 CPU / 252 GB, 1.25 SU/CPU-hr) is both cheaper and faster than the next smaller configuration (24 CPU / 96 GB): 77 SU / 2:10h vs 85 SU / 2:49h
scratch=false retained: MMseqs2 reads large databases in-place; scratch mode causes
staging failures
MSA tasks are being run on normalbw with a full node. Where the higher cpu count is required for these samples only for mem. so cheaper to use 28 normalbw's than 24 normals with 48 CPU worth of mem

RUN_ALPHAFOLD2_MSA (cpus 18 memory 72GB)

Previous 12 CPU / 48 GB allocation ran at 100% memory utilisation (borderline); 16 CPU / 64 GB also hit ceiling (98.7% utilisation)
18 CPU / 72 GB is the first configuration with headroom (70% memory utilisation, 50 GB used), at equivalent SU cost (~8.4 SU) and marginally faster walltime (0:22h vs 0:27h at 12 CPU)

Notes

All GPU processes set to gpuhopper as per TL suggestion based on prior benchmarking
Hugemem queue benchmarking for MMSEQS not yet performed; normalbw full-node is currently the optimal configuration but hugemem may warrant testing as you can request higher mem with lower CPU. Its double the SU rate so would need to make up for it through walltime savings. This would be useful particularly if a higher cpu count actually decreases the efficiency/increases walltime.

vtnphan

Thank you for making these so beautiful. I've had some questions for using params in params { } and assign them in process later

Removed hardcoded cluster options and added a comment about backend population.

updated process parameters based on testing

da1a164

Mitchob requested a review from amandazhuyilan May 28, 2026 23:19

vtnphan reviewed Jun 2, 2026

View reviewed changes

Comment thread proteinfold.config

vtnphan reviewed Jun 2, 2026

View reviewed changes

Comment thread proteinfold.config Outdated

vtnphan requested changes Jun 2, 2026

View reviewed changes

vtnphan and others added 5 commits June 10, 2026 12:57

fix: remove clusterOptions

881dbbb

Removed hardcoded cluster options and added a comment about backend population.

fix: replace if/else by closure-based

bdba7c0

fix: remove workdir

fd33fde

update proteinfold config for individual resources

2e4fb5d

proteinfold use gpu

c6db07e

vtnphan self-requested a review June 11, 2026 00:31

update gadi gpu config

24062e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update gadi config to reduce wasted compute#1

Update gadi config to reduce wasted compute#1
Mitchob wants to merge 7 commits into
mainfrom
SBP-357

Mitchob commented May 28, 2026

Uh oh!

Uh oh!

Uh oh!

vtnphan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Mitchob commented May 28, 2026

Summary

Results

Notes

Uh oh!

Uh oh!

Uh oh!

vtnphan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants