Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
30e23d7
CASM-5740: [FM on baremetal] Documentation for FM migration to baremetal
ravikanth-nalla-hpe Nov 14, 2025
cfd4f9b
Update README.md
ravikanth-nalla-hpe Nov 18, 2025
74eca40
Create Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md
ravikanth-nalla-hpe Nov 18, 2025
f9e80c6
Update Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md
ravikanth-nalla-hpe Nov 20, 2025
07291aa
Update Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md
ravikanth-nalla-hpe Nov 20, 2025
7acddab
Update Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md
ravikanth-nalla-hpe Nov 20, 2025
d24ad07
Update README.md
ravikanth-nalla-hpe Nov 20, 2025
65bb9c8
Update README.md
ravikanth-nalla-hpe Nov 20, 2025
8d8a67a
Update README.md
ravikanth-nalla-hpe Nov 20, 2025
352c093
Update Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md
ravikanth-nalla-hpe Nov 20, 2025
066cba0
Update README.md
ravikanth-nalla-hpe Nov 21, 2025
e36d4ae
Update README.md
ravikanth-nalla-hpe Nov 21, 2025
95b2374
Update README.md
ravikanth-nalla-hpe Nov 21, 2025
5b757f6
Update README.md
ravikanth-nalla-hpe Nov 21, 2025
675cc0f
Update Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md
ravikanth-nalla-hpe Nov 21, 2025
fe154a7
Update README.md
ravikanth-nalla-hpe Nov 23, 2025
7f58def
Update README.md
ravikanth-nalla-hpe Nov 23, 2025
4e0072f
Update Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md
ravikanth-nalla-hpe Nov 23, 2025
40bef73
Update README.md
ravikanth-nalla-hpe Nov 23, 2025
7194d28
Update README.md
ravikanth-nalla-hpe Nov 23, 2025
a55d4fc
Rename Enabling_FM_On_Baremetal_Post_CSM_Upgrade.md to Configure_FM_O…
ravikanth-nalla-hpe Nov 23, 2025
f76a711
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Nov 23, 2025
41d762c
Update README.md
ravikanth-nalla-hpe Nov 23, 2025
42a50d7
Add files via upload
ravikanth-nalla-hpe Nov 23, 2025
5ea1dc3
Update README.md
ravikanth-nalla-hpe Nov 23, 2025
724763d
Update README.md
ravikanth-nalla-hpe Dec 1, 2025
bbd5e1d
Update Add_Remove_Replace_NCNs.md
ravikanth-nalla-hpe Dec 1, 2025
98bc4f2
Update Add_Remove_Replace_NCNs.md
ravikanth-nalla-hpe Dec 2, 2025
29f8339
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 4, 2025
18904c5
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 4, 2025
1412e04
Update Add_NCN_Data.md
ravikanth-nalla-hpe Dec 4, 2025
c4196d0
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 4, 2025
226ee6a
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 4, 2025
cc1b9a8
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 4, 2025
757bb09
Update README.md
ravikanth-nalla-hpe Dec 4, 2025
c2657cd
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 4, 2025
e206501
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 5, 2025
164c396
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Dec 9, 2025
8842e7e
Update README.md
ravikanth-nalla-hpe Dec 9, 2025
e4c8605
Update Add_NCN_Data.md
ravikanth-nalla-hpe Dec 9, 2025
b68531f
Update Add_Remove_Replace_NCNs.md
ravikanth-nalla-hpe Dec 9, 2025
e8d90d5
Update Boot_NCN.md
ravikanth-nalla-hpe Dec 9, 2025
c53a1c9
CASMINST-7513 - Add steps to deploy fmn nodes during CSM install (#6451)
spillerc-hpe Jan 8, 2026
a8158c2
CASMINST-7513 - DOCS: Add steps to deploy fmn nodes during CSM instal…
spillerc-hpe Jan 15, 2026
47bf963
CASMTRIAGE-8993 - apply_csm_configuration.sh needs to be updated to i…
spillerc-hpe Jan 19, 2026
eccdb7d
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
c28afef
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
d5e22b6
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
b5abb30
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
cf56df8
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
2e2233e
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
0be95c8
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
140f74a
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
46234b9
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
f7ea395
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
4c8e2fb
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
4fa9294
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
c311d71
Update glossary.md
ravikanth-nalla-hpe Jun 2, 2026
eac0892
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
4a01d92
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
118ea2a
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
3ca8d56
Update .spelling
ravikanth-nalla-hpe Jun 2, 2026
4429376
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
5149d94
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 2, 2026
fecc478
Update README.md
ravikanth-nalla-hpe Jun 2, 2026
b739ccd
Update Redeploy_Fabric_Manager_Nodes.md
ravikanth-nalla-hpe Jun 2, 2026
6eaa050
Update Add_NCN_Data.md
ravikanth-nalla-hpe Jun 2, 2026
17c6544
Update Add_Remove_Replace_NCNs.md
ravikanth-nalla-hpe Jun 2, 2026
db79f27
Update Boot_NCN.md
ravikanth-nalla-hpe Jun 2, 2026
8388ef2
Update README.md
ravikanth-nalla-hpe Jun 4, 2026
9cc8081
Update Redeploy_Fabric_Manager_Nodes.md
ravikanth-nalla-hpe Jun 4, 2026
2637fc8
Update Add_Remove_Replace_NCNs.md
ravikanth-nalla-hpe Jun 4, 2026
5f227ae
Update Add_Remove_Replace_NCNs.md
ravikanth-nalla-hpe Jun 4, 2026
185ad35
Update README.md
ravikanth-nalla-hpe Jun 4, 2026
23a049d
Apply suggestions from code review
ravikanth-nalla-hpe Jun 8, 2026
aeee076
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 9, 2026
f866d30
Apply suggestions from code review
ravikanth-nalla-hpe Jun 9, 2026
c2b1e3f
Apply suggestions from code review
ravikanth-nalla-hpe Jun 9, 2026
8504727
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 9, 2026
cf22db0
Update Redeploy_Fabric_Manager_Nodes.md
ravikanth-nalla-hpe Jun 9, 2026
640c619
Update README.md
ravikanth-nalla-hpe Jun 9, 2026
f261b9e
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 11, 2026
7060742
Merge branch 'release/1.7' into CASM-5740-fm-ha
ravikanth-nalla-hpe Jun 11, 2026
30cd70d
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 11, 2026
c9887f5
CASM-5740: FM migration on baremetal nodes (FMNs)
ravikanth-nalla-hpe Jun 11, 2026
a751fae
Apply suggestion from @sravani-sanigepalli
ravikanth-nalla-hpe Jun 16, 2026
2f7c7d8
Apply suggestions from code review
ravikanth-nalla-hpe Jun 16, 2026
1ce4624
Update Add_NCN_Data.md
ravikanth-nalla-hpe Jun 16, 2026
cc5bd35
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 16, 2026
c1a7fa0
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 16, 2026
42b5b5a
Update Configure_FM_On_Baremetal.md
ravikanth-nalla-hpe Jun 16, 2026
83dda88
Update README.md
ravikanth-nalla-hpe Jun 16, 2026
1f16e78
Clarify NCN procedures in documentation
sravani-sanigepalli Jun 16, 2026
034c350
Update Configure_FM_On_Baremetal.md
sravani-sanigepalli Jun 16, 2026
cf1b055
Update Configure_FM_On_Baremetal.md
sravani-sanigepalli Jun 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .spelling
Original file line number Diff line number Diff line change
Expand Up @@ -872,6 +872,10 @@ xnames
zeroization
zeroize
zypper
Baremetal
baremetal
FabricManager
FMNs

# Network Terms - Starting to organize a little bit here but it's not done
0.5m
Expand Down
13 changes: 13 additions & 0 deletions glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ Glossary of terms used in CSM documentation.
* [EX Compute Cabinet](#ex-compute-cabinet)
* [EX TDS Cabinet](#ex-tds-cabinet)
* [Fabric](#fabric)
* [Fabric Manager](#fabric-manager)
* [Fabric Manager Node](#fabric-manager-node)
* [Firmware Action Service (FAS)](#firmware-action-service-fas)
* [Floor Standing CDU](#floor-standing-cdu)
* [Hardware Management Network (HMN)](#hardware-management-network-hmn)
Expand Down Expand Up @@ -376,6 +378,17 @@ compute blades and 16 [High Speed Network (HSN)](#high-speed-network-hsn) switch
The [Slingshot](#slingshot) fabric consists of the switches, cables, ports, topology policy, and
configuration settings for the Slingshot [High-Speed Network](#high-speed-network-hsn).

## Fabric Manager

The [Slingshot](#slingshot) Fabric Manager software includes a suite of software which configures, manages, and monitors the network. It runs on an
external server and communicates with the switches over the out-of-band management network.

## Fabric Manager Node

The [Slingshot](#slingshot) Fabric Manager runs on at least one dedicated server referred to as the HPE Slingshot Fabric Manager Node (FMN). It also runs on the
HPE Slingshot switches. HPE Slingshot Fabric Manager software is installed on a bare metal server (FMN) instead of using Kubernetes pods in order to support
systems with HPE Slingshot version 3.0.0 and above and High Availability (HA) requirements.

## Firmware Action Service (FAS)

The Firmware Action Service (FAS) provides an interface for managing firmware versions of Redfish-enabled hardware in the system.
Expand Down
9 changes: 9 additions & 0 deletions install/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ shown here with numbered topics.
1. [Kubernetes encryption](#1-kubernetes-encryption)
1. [Export Nexus data](#2-export-nexus-data)
- [Installation of additional HPE Cray EX software products](#installation-of-additional-hpe-cray-ex-software-products)
- [Fabric Manager Node redeployment](#fabric-manager-node-redeployment)

> **`NOTE`** If problems are encountered during the installation,
> [Troubleshooting installation problems](#12-troubleshooting-installation-problems) and
Expand Down Expand Up @@ -336,3 +337,11 @@ See the [Install or upgrade additional products with IUF](../operations/iuf/work
procedure to continue with the installation of additional HPE Cray EX software products.

For additional information on the IUF, see [Install and Upgrade Framework](../operations/iuf/IUF.md).

## Fabric Manager Node redeployment

> **OPTIONAL:** This section is only applicable if Fabric Manager nodes were deployed during the CSM installation.

After additional HPE Cray EX software products have been installed, Fabric Manager nodes need to be redeployed with a new customized image.

See [Redeploy Fabric Manager Nodes](../operations/fm_on_baremetal/Redeploy_Fabric_Manager_Nodes.md).
76 changes: 64 additions & 12 deletions install/deploy_non-compute_nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The following procedure deploys Linux and Kubernetes software to the management NCNs.
Deployment of the nodes starts with booting the storage nodes, followed by the master nodes
and worker nodes together.
and worker nodes together. Optionally, HPE Slingshot Fabric Manager nodes can also be deployed.

After the operating system boots on each node, there are some configuration actions which
take place. Watching the console or the console log for certain nodes can help to understand
Expand All @@ -26,7 +26,8 @@ the number of storage and worker nodes.
1. [Deploy management nodes](#2-deploy-management-nodes)
1. [Deploy storage NCNs](#21-deploy-storage-ncns)
1. [Deploy Kubernetes NCNs](#22-deploy-kubernetes-ncns)
1. [Configure `kubectl` on the PIT](#23-configure-kubectl-on-the-pit)
1. [Deploy HPE Slingshot Fabric Manager nodes (optional)](#23-deploy-hpe-slingshot-fabric-manager-nodes-optional)
1. [Configure `kubectl` on the PIT](#24-configure-kubectl-on-the-pit)
1. [Validate deployment](#3-validate-deployment)
1. [Next topic](#next-topic)

Expand All @@ -52,9 +53,11 @@ Preparation of the environment must be done before attempting to deploy the mana
> These values do not need to be altered from what is shown.

```bash
export IPMI_PASSWORD ; mtoken='ncn-m(?!001)\w+-mgmt' ; stoken='ncn-s\w+-mgmt' ; wtoken='ncn-w\w+-mgmt'
export IPMI_PASSWORD ; mtoken='ncn-m(?!001)\w+-mgmt' ; stoken='ncn-s\w+-mgmt' ; wtoken='ncn-w\w+-mgmt' ; ftoken='fmn\w+-mgmt'
```

> **NOTE:** The `ftoken` variable is used for HPE Slingshot Fabric Manager nodes, which are optional and not present on all systems.

### 1.2. BIOS baseline

1. (`pit#`) If the NCNs are HPE hardware, then ensure that DCMI/IPMI is enabled.
Expand All @@ -68,14 +71,16 @@ Preparation of the environment must be done before attempting to deploy the mana
1. (`pit#`) Check power status of all NCNs.

```bash
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u |
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power status
```

> **NOTE:** If the system does not have HPE Slingshot Fabric Manager nodes, the `ftoken` pattern will not match any entries.

1. (`pit#`) Power off all NCNs.

```bash
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u |
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power off
```

Expand All @@ -89,16 +94,16 @@ Preparation of the environment must be done before attempting to deploy the mana
- Disable VT-x, AMD-V, SVM, VT-d, and AMD IOMMU for Virtualization, on both AMD and Intel CPUs; there is no way to enable at this time.

```bash
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u |
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} chassis bootdev none options=clear-cmos
```

1. (`pit#`) Boot NCNs to BIOS to allow the CMOS to reinitialize.

```bash
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u |
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} chassis bootdev bios options=efiboot
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u |
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power on
```

Expand All @@ -113,7 +118,7 @@ Preparation of the environment must be done before attempting to deploy the mana
1. (`pit#`) Power off the nodes.

```bash
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u |
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u |
xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power off
```

Expand All @@ -134,8 +139,8 @@ for all nodes, the Ceph storage will have been initialized and the Kubernetes cl
1. (`pit#`) Set each node to always UEFI network boot, and ensure that they are powered off.

```bash
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} chassis bootdev pxe options=efiboot,persistent
grep -oP "(${mtoken}|${stoken}|${wtoken})" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power off
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} chassis bootdev pxe options=efiboot,persistent
grep -oP "(${mtoken}|${stoken}|${wtoken}|${ftoken})" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power off
```

> **NOTE:** The NCN boot order is further explained in [NCN Boot Workflow](../background/ncn_boot_workflow.md).
Expand Down Expand Up @@ -243,7 +248,54 @@ for all nodes, the Ceph storage will have been initialized and the Kubernetes cl

> **NOTE:** To exit a conman console, press `&` followed by a `.` (e.g. keystroke `&.`)

### 2.3 Configure `kubectl` on the PIT
### 2.3 Deploy HPE Slingshot Fabric Manager nodes (optional)

> **NOTE:** This section only applies to systems with HPE Slingshot Fabric Manager nodes. If the system does not have Fabric Manager nodes, skip this section and proceed to [Configure `kubectl` on the PIT](#24-configure-kubectl-on-the-pit).

HPE Slingshot Fabric Manager nodes have hostnames like `fmn001`, `fmn002`, etc., with corresponding BMC names like `fmn001-mgmt`, `fmn002-mgmt`, etc.

1. (`pit#`) Verify that Fabric Manager nodes are present in the system.

```bash
grep -oP "${ftoken}" /etc/dnsmasq.d/statics.conf | sort -u
```

If this command returns no output, there are no Fabric Manager nodes to deploy. Skip the remaining steps in this section.

1. (`pit#`) Check power status of Fabric Manager nodes.

```bash
grep -oP "${ftoken}" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power status
```

1. (`pit#`) Boot the **Fabric Manager nodes**.

```bash
grep -oP "${ftoken}" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U "${USERNAME}" -E -H {} power on
```

1. (`pit#`) Observe the installation through the console of the first Fabric Manager node.

```bash
conman -j fmn001-mgmt
```

> **NOTES:**
>
> - If the nodes have PXE boot issues (e.g. getting PXE errors, not pulling the `ipxe.efi` binary), then see [Troubleshooting PXE Boot](troubleshooting_pxe_boot.md).
> - To exit a conman console, press `&` followed by a `.` (e.g. keystroke `&.`)

1. (`pit#`) Wait for the Fabric Manager nodes to complete `cloud-init`.

The following text should appear in the console:

```text
The system is finally up, after XXXX.XX seconds cloud-init has come to completion.
```

> **NOTE:** The duration reported will vary.

### 2.4 Configure `kubectl` on the PIT

1. (`pit#`) This was done in a previous step, but if the user is resuming/starting here then the first master needs to be
redefined.
Expand Down
Loading
Loading