feat(slurm): add startup_script support to slurm login nodes#5800
feat(slurm): add startup_script support to slurm login nodes#5800kvenkatachala333 wants to merge 1 commit into
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for startup scripts on Slurm login nodes within the Google Cloud Platform cluster toolkit. By updating the schema and wiring the necessary logic through the controller and login modules, users can now successfully inject custom startup scripts into their login node configurations. The changes are designed to be fully backward compatible, ensuring that existing deployments remain unaffected. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for custom startup scripts in the login nodes of the schedmd-slurm-gcp-v6 scheduler module. The review feedback highlights two potential plan-time crashes related to null values: one where concat() could fail if each.value.startup_script is null or missing, and another where a null var.startup_script could cause a crash during downstream resource computation. Both issues can be resolved by using safe fallbacks and conditional checks.
| login_nodes = each.value | ||
|
|
||
| startup_scripts = local.login_startup_scripts | ||
| startup_scripts = concat(local.login_startup_scripts, each.value.startup_script) |
There was a problem hiding this comment.
If each.value.startup_script is explicitly set to null or is missing, the concat function will fail at plan-time because it expects all arguments to be lists.
To prevent plan-time crashes and safely handle null or missing values (in alignment with the general rule of keeping variables nullable and using safe fallbacks), use a try() block to safely access the attribute and default to an empty list [] if it is null or missing.
startup_scripts = concat(local.login_startup_scripts, try(each.value.startup_script != null ? each.value.startup_script : [], []))
References
- To allow variables to be explicitly set to null without automatically overriding them to default configurations, keep the variable nullable and use try() blocks in the main configuration to safely access its attributes and prevent plan-time crashes.
There was a problem hiding this comment.
can you please check if this is required to be addressed or not
| ghpc_startup_script = [{ | ||
| filename = "ghpc_login_startup.sh" | ||
| content = var.startup_script | ||
| }] |
There was a problem hiding this comment.
If var.startup_script is explicitly set to null, content will be null. This will cause a plan-time crash in the downstream google_storage_bucket_object.startup_scripts resource when computing md5(each.value).
To prevent this, instead of overriding the user's explicit null setting with a default string, conditionally omit the startup script configuration entirely if var.startup_script is null.
ghpc_startup_script = var.startup_script != null ? [{
filename = "ghpc_login_startup.sh"
content = var.startup_script
}] : []
References
- To allow variables to be explicitly set to null without automatically overriding them to default configurations, keep the variable nullable and use try() blocks in the main configuration to safely access its attributes and prevent plan-time crashes.
Description
Fixes #5716 by adding
startup_scriptsupport to theschedmd-slurm-gcp-v6-loginmodule.Previously, the
runner-startupmodule was silently ignored on login nodes because the variable was missing from the login module schema. This PR plumbs the script correctly through both theloginandcontrollermodules so that the blueprint generator can successfully inject startup scripts into login nodes.Changes
schedmd-slurm-gcp-v6-login: Addedstartup_scriptinput variable and wired it to output via thelogin_nodemap.schedmd-slurm-gcp-v6-controller: Updated thelogin_nodesvariable schema to acceptstartup_scriptand appended it to the internal login module instantiation logic.Existing configurations and workflows lacking a
startup_scriptwill gracefully fallback to empty lists and# no-opstrings, ensuring 100% backward compatibility.