xCAT (Extreme Cloud Administration Toolkit) is an open-source cluster management tool designed to manage HPC (High-Performance Computing) clusters, cloud environments, and large-scale server farms. It provides a single command-line interface to install, configure, and manage clusters efficiently.
๐น Key Features:
โ๏ธ Automated OS installation (Diskful & Diskless)
โ๏ธ Hardware discovery and management
โ๏ธ Network configuration and management
โ๏ธ Power control and remote booting
โ๏ธ Monitoring and logging
xCAT is a client-server-based system where multiple nodes are managed centrally by a Management Node (MN).
- Management Node (MN) ๐ฅ๏ธ: Central controller that manages compute nodes.
- Compute Nodes (CN) ๐ฅ๏ธ: Worker nodes that run workloads.
- Service Nodes (SN) (Optional) ๐: Intermediate nodes that handle booting and installation tasks in large clusters.
- xCAT Database ๐๏ธ: Stores node configurations, OS images, and logs.
๐ Communication in xCAT:
xCAT uses SSH, PXE boot, TFTP, HTTP, and DHCP to interact with nodes.
Nodes in an xCAT cluster need an operating system to function. xCAT supports two installation methods:
โ
OS is installed on the local hard disk of each node.
โ
Persistent storage; changes remain after reboot.
๐น Command to install OS on a node:
nodeset <node_name> install๐น Command to reboot and install OS:
rpower <node_name> boot๐น Command to check the OS installation method:
lsdef -t osimage -i provmethodโ
Nodes boot OS directly from RAM using a network image.
โ
No OS is permanently installed on the node.
๐น Command to configure a node for diskless booting:
nodeset <node_name> netbootTo add compute nodes, define them in xCAT's database:
mkdef -t node <node_name> groups=compute๐น List all nodes:
lsdef -t node๐น Remove a node:
rmdef <node_name>xCAT allows administrators to control node power remotely:
- Power on a node:
rpower <node_name> on
- Power off a node:
rpower <node_name> off
- Reboot a node:
rpower <node_name> reset
rcons <node_name>xCAT can automatically configure DHCP to assign IPs to nodes.
- To enable DHCP:
makedhcp -n
To configure DNS settings for nodes:
makedns -nlsdef -t networkOS images define how nodes will be installed or booted.
To create a diskless OS image:
mkdef -t osimage myimage provmethod=netboot os=rhels8 arch=x86_64lsdef -t osimagenodeset <node_name> osimage=myimageTo view logs of node activities:
tail -f /var/log/xcat/cluster.logCheck whether nodes are reachable:
ping <node_name>Check if xCAT can access nodes:
nodelsRun a command on all compute nodes simultaneously:
pdsh -w node[01-10] uptimeTo customize actions when a node boots:
vi /install/postscripts/myscript.sh
chmod +x /install/postscripts/myscript.shThen assign it to nodes:
chdef -t node <node_name> postscripts=myscript.shxCAT can be used to deploy virtual machines and cloud clusters on OpenStack, Kubernetes, and AWS.
| Task | Command |
|---|---|
| Add a node | mkdef -t node <node_name> groups=compute |
| List nodes | lsdef -t node |
| Remove a node | rmdef <node_name> |
| Power on a node | rpower <node_name> on |
| Reboot a node | rpower <node_name> reset |
| Set OS install | nodeset <node_name> install |
| Set OS diskless boot | nodeset <node_name> netboot |
| Check installed OS | lsdef -t osimage -i provmethod |
| View logs | tail -f /var/log/xcat/cluster.log |
xCAT is a powerful, scalable, and automated cluster management tool used in HPC, cloud computing, and data center automation. By using xCAT, system administrators can quickly deploy, monitor, and manage thousands of nodes efficiently.
Let's dive into advanced xCAT concepts ๐, covering custom configurations, automation, troubleshooting, and cloud integration.
For diskless nodes, OS runs in RAM. You can customize the image before deploying it.
1๏ธโฃ Generate an image template
copycds /path/to/rhel-8.iso
mkdef -t osimage rhel8-diskless provmethod=netboot os=rhel8 arch=x86_64 profile=compute2๏ธโฃ Modify the image (e.g., add packages)
chdef -t osimage rhel8-diskless otherpkglist=/install/custom/packages.lst3๏ธโฃ Rebuild the OS image
genimage rhel8-diskless4๏ธโฃ Deploy the diskless image
nodeset compute01 osimage=rhel8-diskless
rpower compute01 reset๐ก Tip: Store applications in GPFS/NFS instead of the RAM disk to reduce memory usage.
For stateful nodes, OS is installed on the local disk.
mkdef -t osimage rhel8-stateful provmethod=install os=rhel8 arch=x86_64 profile=compute๐น Customize the OS installation (Kickstart for RHEL, AutoYAST for SLES, Preseed for Ubuntu):
vi /install/custom/kickstart.cfg
chdef -t osimage rhel8-stateful template=/install/custom/kickstart.cfg๐น Deploy the OS
nodeset compute01 osimage=rhel8-stateful
rpower compute01 boot๐ก Tip: Use network-based storage (NFS, GPFS, Ceph) for shared persistent data across nodes.
Create a Bash script to deploy nodes automatically:
#!/bin/bash
# Define a list of nodes
NODES=("compute01" "compute02" "compute03")
# Loop through nodes and configure them
for NODE in "${NODES[@]}"; do
mkdef -t node $NODE groups=compute
nodeset $NODE osimage=rhel8-diskless
rpower $NODE boot
done
echo "All nodes have been deployed successfully!"๐ก Tip: Add error handling and logging for better debugging.
Ansible + xCAT = Full automation of cluster management.
1๏ธโฃ Install Ansible
yum install ansible -y2๏ธโฃ Create an Ansible Playbook (deploy_xcat.yml)
- name: Deploy xCAT Cluster
hosts: xcat_mgmt
become: yes
tasks:
- name: Add nodes to xCAT
command: mkdef -t node compute01 groups=compute
- name: Set OS image for compute nodes
command: nodeset compute01 osimage=rhel8-diskless
- name: Boot compute nodes
command: rpower compute01 boot3๏ธโฃ Run the Playbook
ansible-playbook deploy_xcat.yml๐ก Tip: Use Ansible for scaling, security hardening, and monitoring.
- Cluster-wide logs
tail -f /var/log/xcat/cluster.log
- OS provisioning logs
tail -f /var/log/xcat/osimage.log
If a node fails to install: 1๏ธโฃ Check node status
nodels compute012๏ธโฃ Get detailed logs from the node
xdsh compute01 "cat /var/log/messages"3๏ธโฃ Restart the installation
nodeset compute01 install
rpower compute01 boot๐ก Tip: Enable verbose mode for detailed logs:
export XCATDEBUGMODE=1To assign a VLAN to nodes:
chdef -t network vlan100 net=192.168.100.0 mask=255.255.255.0 mgtifname=eth1Apply VLAN settings:
makehosts
makedhcp -n๐ก Tip: Use VLANs to separate management, compute, and storage networks.
1๏ธโฃ Restrict xCAT access to trusted nodes
iptables -A INPUT -s 192.168.1.0/24 -p tcp --dport 3001 -j ACCEPT
iptables -A INPUT -p tcp --dport 3001 -j DROP2๏ธโฃ Enable Secure Boot & Trusted Platform Module (TPM)
Modify boot parameters to enforce secure boot policies:
chdef -t osimage rhel8-secure otherpkglist=/install/custom/security.lstxCAT supports VM-based clusters using KVM or OpenStack.
1๏ธโฃ Create a virtual node
mkdef -t node vm01 groups=compute vmcpus=4 vmmemory=8GB2๏ธโฃ Set up network bridging for VMs
chdef vm01 nic=br0 ip=192.168.1.1003๏ธโฃ Start the VM
rpower vm01 on๐ก Tip: Use PXE boot with KVM/QEMU for automated virtual cluster provisioning.
xCAT can manage bare-metal Kubernetes clusters.
1๏ธโฃ Install Kubernetes on nodes
xdsh compute01 "yum install -y kubeadm kubelet kubectl"2๏ธโฃ Initialize the Kubernetes cluster
xdsh compute01 "kubeadm init"3๏ธโฃ Deploy a workload
kubectl apply -f hpc-job.yaml๐ก Tip: Use MetalLB for Load Balancing in bare-metal Kubernetes clusters.
To prevent failures, use a redundant xCAT management node.
1๏ธโฃ Configure a secondary management node
mkdef -t node xcat-ha groups=mgmt2๏ธโฃ Synchronize configurations
rsync -av /etc/xcatdb xcat-ha:/etc/3๏ธโฃ Use Keepalived for failover
yum install keepalived -y
vi /etc/keepalived/keepalived.conf
systemctl start keepalived๐ก Tip: Use Pacemaker + Corosync for advanced HA setups.
| Topic | Command/Feature |
|---|---|
| Custom OS images | genimage |
| Automating xCAT | Ansible, Bash scripts |
| Debugging logs | /var/log/xcat/ |
| VLAN configuration | chdef -t network |
| Security Hardening | iptables, Secure Boot |
| Cloud integration | OpenStack, KVM |
| Kubernetes integration | kubectl, kubeadm |
| High Availability | Keepalived, Pacemaker |
xCAT (Extreme Cloud Administration Toolkit) is an open-source cluster management tool designed for large-scale HPC (High-Performance Computing) and cloud environments. It provides automated deployment, node management, monitoring, and administration.
โ
Key Features of xCAT
โ๏ธ OS Deployment โ Automated OS installation for diskful/diskless systems
โ๏ธ Scalability โ Manages thousands of nodes efficiently
โ๏ธ Networking & Security โ VLANs, DHCP, firewalls, and authentication
โ๏ธ Virtualization & Cloud Support โ OpenStack, KVM, and bare-metal provisioning
โ๏ธ Automation & Scripting โ Supports Bash, Ansible, Python
xCAT consists of management nodes and compute nodes:
- Management Node: Controls all cluster operations
- Compute Nodes: Run workloads and receive OS images from the management node
๐น Key Components:
| Component | Description |
|---|---|
| MN (Management Node) | Central controller of the cluster |
| CN (Compute Node) | Worker nodes in the cluster |
| xCAT Database | Stores node configurations and settings |
| xCAT Services | DHCP, DNS, TFTP, NTP, IPMI, etc. |
| Discovery Service | Automatically detects new nodes |
To install xCAT on a management node (Ubuntu 20.04):
wget https://xcat.org/files/xcat/repos/xcat-core/xcat-core.repo
yum install -y xCATAfter installation, initialize xCAT services:
tabdump site # View default settings
chtab key=passwd.password=cluster1234 site๐ก Tip: Use chtab to modify xCAT settings.
To add compute nodes to the cluster:
mkdef compute01 groups=computeTo check all nodes:
nodelsTo assign an IP address:
chdef compute01 ip=192.168.1.10๐ก Tip: Use nodegroup to apply changes to multiple nodes.
๐น xCAT supports two types of OS deployments:
1๏ธโฃ Diskful (Stateful) โ OS is installed on the local disk
2๏ธโฃ Diskless (Stateless) โ OS runs from the network (RAM disk)
mkdef -t osimage rhel8-stateful provmethod=install os=rhel8 arch=x86_64 profile=compute
nodeset compute01 osimage=rhel8-stateful
rpower compute01 bootmkdef -t osimage rhel8-diskless provmethod=netboot os=rhel8 arch=x86_64 profile=compute
genimage rhel8-diskless
nodeset compute01 osimage=rhel8-diskless
rpower compute01 reset๐ก Tip: Diskless OS is ideal for fast, scalable cluster deployments.
xCAT handles network management using DHCP, DNS, VLANs, and firewall rules.
makedhcp -n๐ก Tip: Modify /etc/dhcp/dhcpd.conf for custom DHCP settings.
To assign a VLAN to nodes:
chdef -t network vlan100 net=192.168.100.0 mask=255.255.255.0 mgtifname=eth1iptables -A INPUT -s 192.168.1.0/24 -p tcp --dport 3001 -j ACCEPT๐ก Tip: Use VLANs to separate management, compute, and storage networks.
Instead of running multiple xCAT commands manually, we can automate deployments.
#!/bin/bash
NODES=("compute01" "compute02")
for NODE in "${NODES[@]}"; do
mkdef -t node $NODE groups=compute
nodeset $NODE osimage=rhel8-diskless
rpower $NODE boot
done- name: Deploy xCAT Cluster
hosts: xcat_mgmt
become: yes
tasks:
- name: Add nodes
command: mkdef -t node compute01 groups=compute๐ก Tip: Use Ansible to automate scaling, monitoring, and security.
xCAT supports KVM, OpenStack, and VMware for managing virtual clusters.
mkdef -t node vm01 groups=compute vmcpus=4 vmmemory=8GBpackstack --allinone๐ก Tip: Virtual clusters are ideal for testing and hybrid cloud environments.
xCAT can manage bare-metal Kubernetes clusters.
xdsh compute01 "yum install -y kubeadm kubelet kubectl"
kubeadm initkubectl apply -f hpc-job.yaml๐ก Tip: Use MetalLB for load balancing in bare-metal Kubernetes clusters.
For fault tolerance, xCAT supports HA management nodes.
mkdef -t node xcat-ha groups=mgmt
rsync -av /etc/xcatdb xcat-ha:/etc/yum install keepalived -y
vi /etc/keepalived/keepalived.conf
systemctl start keepalived๐ก Tip: Use Pacemaker + Corosync for advanced failover setups.
When issues arise, use logs and debugging tools.
tail -f /var/log/xcat/cluster.lognodels compute01
xdsh compute01 "cat /var/log/messages"
nodeset compute01 install
rpower compute01 boot๐ก Tip: Enable verbose mode for detailed debugging:
export XCATDEBUGMODE=1| Feature | Command/Concept |
|---|---|
| Custom OS images | genimage, nodeset |
| Automated deployment | Ansible, Bash scripting |
| Debugging | /var/log/xcat/ |
| VLAN networking | chdef -t network |
| Security hardening | iptables, Secure Boot |
| Cloud integration | OpenStack, KVM |
| Kubernetes integration | kubectl, kubeadm |
| High Availability | Keepalived, Pacemaker |
xCAT is powerful for automating HPC clusters, cloud deployments, and Kubernetes integration. ๐ By mastering OS deployment, networking, security, automation, and HA, you can efficiently manage large-scale infrastructures.
๐จโ๐ป ๐๐ป๐ช๐ฏ๐ฝ๐ฎ๐ญ ๐ซ๐: Suraj Kumar Choudhary | ๐ฉ ๐๐ฎ๐ฎ๐ต ๐ฏ๐ป๐ฎ๐ฎ ๐ฝ๐ธ ๐๐ ๐ฏ๐ธ๐ป ๐ช๐ท๐ ๐ฑ๐ฎ๐ต๐น: csuraj982@gmail.com