add: AWS ARM node tooling upgrade guide for Orka 3.6 (OK-5476)#260
add: AWS ARM node tooling upgrade guide for Orka 3.6 (OK-5476)#260celanthe wants to merge 8 commits into
Conversation
Documents the new Ansible-based in-place upgrade path for ARM EC2 Mac nodes, replacing the AMI replacement approach. Covers SSH/SSM prereqs, required node tag (role=orka-arm), what's preserved during upgrade, and what changes in 3.5 to 3.6 for AWS deployments. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
…ides to AWS upgrade guide Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
A lot of the examples in the original doc are missing.
My suggestion would be to add these to the doc. |
|
One general theme I notice is that we say that:
It is a self service upgrade. No need to contact us for anything. |
|
The doc description says It talks more about the ARM nodes, but it doesn't mention the services upgrade. Is the goal to be a general upgrade guide (both services and nodes) or nodes specific only? Nodes are now upgraded with Ansible. |
…review Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/IAM examples Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n sets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thanks for the review @ispasov . I updated the PR to address your feedback. Rewrote the framing as fully self-service throughout, added the CodeBuild buildspec, Secrets Manager IAM policy, and SSM Session Manager and S3 permissions as copy-paste blocks. Removed the credential scoping section. Expanded scope to cover both services and node tooling upgrades, with a note that AMI replacement is still required for host OS upgrades. Multi-region section updated to cover both the multi-step and per-run override approaches. If anything else needs updating, do let me know and I will address it. |
| env: | ||
| shell: bash | ||
| secrets-manager: | ||
| SSH_PRIVATE_KEY: "<your-secret-name>" |
There was a problem hiding this comment.
Lets show an example how to set the default region here as well.
You have it in the command above, but ideally we show people they have options
|
|
||
| Previously, updating tooling on ARM nodes required replacing the EC2 Mac AMI: the instance had to be deleted, a new one provisioned (a process that takes approximately 2 hours), and the node's name, namespace, and custom tags had to be manually reapplied. | ||
|
|
||
| Starting with the 3.5 to 3.6 upgrade path, ARM node tooling is updated in place using Ansible over SSH. The upgrade takes under 10 minutes per node. The following are read from the running node and reapplied automatically: node name, node IP, cluster registration, license key, VM quota, and storage layout (including data volumes on instances with local NVMe). Running VMs are not interrupted. |
There was a problem hiding this comment.
It takes < 10min combined, not per node.
Although this depends on the number of nodes.
If they have 100, it may take 15.
|
|
||
| ### Upgrade Service is installed | ||
|
|
||
| As part of the 3.6 upgrade, the Orka Upgrade Service is deployed to your cluster. This enables smoother tooling updates in future Orka releases without requiring AMI replacement. |
There was a problem hiding this comment.
without requiring AMI replacement. There will be caess where a replacement will be needed (you mention them above).
Maybe we should clarify this so people do not get confused.
We can say "without requireing to run CodeBuild"
|
|
||
| ### cert-manager behavior change | ||
|
|
||
| Orka no longer installs its own cert-manager if one is already present in the cluster. If your cluster runs its own cert-manager and you previously experienced version or configuration conflicts with Orka's bundled installation, those conflicts are resolved in 3.6. |
There was a problem hiding this comment.
A smal nuance needs to be added - it no longer installs it if the customer configures it not to install it. It does not check for the presence of another cert manager installation
| ## After the upgrade | ||
|
|
||
| 1. [Download and install](/orka/orka-overview/tools-integrations) the Orka 3.6 CLI if you haven't already. | ||
| 2. Regenerate Service Account tokens for any automated workflows: `orka3 serviceaccount token <name>` |
There was a problem hiding this comment.
2 and 3 are not relevant.
SA regeneration is MSDC specific. Only when the K8s cluster is recreated.
Images are not removed from the cache during an upgrade
| 3. Repopulate the image cache on your ARM nodes if needed: `orka3 imagecache add <image> --all` | ||
|
|
||
| <Warning> | ||
| Service Account tokens must be regenerated after this upgrade. Any automated workflows using service account tokens will fail until tokens are regenerated with `orka3 serviceaccount token <name>`. |
There was a problem hiding this comment.
mentioned this above - this is not relevant
|
|
||
| ## Upgrading the Orka services | ||
|
|
||
| The Orka Kubernetes services are upgraded the same way they were installed: run the CodeBuild project pointed at the Orka 3.6 Ansible image. No additional configuration is required. |
There was a problem hiding this comment.
Lets add a link to the original doc. In case people do not remember what they have done.
|
|
||
| If your nodes cannot accept SSH, the upgrade can run over SSM instead. SSM upgrades require an S3 bucket in the same region as your ARM nodes for Ansible file transfer, and can take significantly longer (up to 4 hours). SSH is strongly recommended. | ||
|
|
||
| ### Enabling SSH on nodes launched without a key pair |
There was a problem hiding this comment.
We can link the article from the confluence doc that points to AWS doc explaining how to rotate the SSH key.
I imagine people would want to rotate it.
| ansible-playbook -i arm.ssm.aws_ec2.yml configure-arm.yml | ||
| ``` | ||
|
|
||
| ## Changing node values during the upgrade |
There was a problem hiding this comment.
Lets mention that this can be run not only during an upgrade.
Customers may want to rename nodes - they run the CodeBuild project configured above and pass one of the Ansible vars here.
There was a problem hiding this comment.
Lets mention that this can be run not only during an upgrade. Customers may want to rename nodes - they run the CodeBuild project configured above and pass one of the Ansible vars here.
Thanks for the second pass, @ispasov. Updated:
- Timing fixed: "Under 10 minutes," total (Ansible parallel), not per node
- Section heading and Upgrade Service copy updated to "no longer require provisioning a new EC2 Mac instance," made it clearer what's changed without implying AMI replacement is gone entirely
- cert-manager: rewritten as explicit opt-out, not presence-detection
- "After the upgrade" trimmed to just the CLI install, Service Account token regeneration and image cache repopulation removed as these are MSDC specific
- Added AWS_DEFAULT_REGION example variable in the buildspec
- Added key rotation link and installation guide link
- "Changing node values" section: Renamed and clarified it can be run independently, not only during an upgrade
Let me know if any of these landed wrong, and thank you again! :)
- Add AWS blog link for SSH key pair rotation in Enabling SSH section - Add link to installation guide in Upgrading the Orka services section - Add AWS_DEFAULT_REGION example variable in buildspec env.variables - Rename "Changing node values during the upgrade" to "Changing node values"; clarify playbook can run independently to rename nodes - Fix timing: Ansible runs nodes in parallel, typical deployment under 10 minutes total (not per node) - Update section heading and Upgrade Service copy: "no longer require provisioning a new EC2 Mac instance" instead of "AMI replacement" (AMI replacement still required for host OS upgrades) - Fix cert-manager language: opt-out via explicit config, not auto-detected presence - Remove SA token regeneration and image cache items from After the upgrade (MSDC-specific, not applicable to AWS self-service); remove Warning block Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
upgrading-orka-on-aws.mdx: customer-facing upgrade guide for AWS deployments (3.5 to 3.6)role=orka-arm), what's preserved during upgrade (node name, IP, cluster registration, license key, VM quota, storage layout, running VMs), and all AWS-specific changes in 3.6docs.jsonTracks MPD-67. Scoped to EC2 ARM nodes only — hybrid deployment section (EKS control plane + on-prem Mac nodes) is a separate follow-on pending engineering input (DI-623).
Test plan
🤖 Generated with Claude Code