The Node Debug Dashboard includes an SSH server (port 2022) with zsh + oh-my-zsh, 200+ pre-installed diagnostic tools, and custom diagnostic scripts — turning every Kubernetes node into a full debugging workstation.
# Password auth (default: debug/debug)
ssh -p 2022 debug@<node-ip>
# Root access
ssh -p 2022 root@<node-ip>
# With key auth
ssh -p 2022 -i ~/.ssh/id_ed25519 debug@<node-ip>Both users have full sudo access. The debug user is recommended for daily use.
| Feature | Details |
|---|---|
| Shell | Zsh with oh-my-zsh (agnoster theme) |
| Editor | Vim with custom config (syntax, line numbers, status line) |
| Prompt | Shows user@node-name (Kubernetes node name) |
| Plugins | git, docker, kubectl, colored-man-pages, autosuggestions, syntax-highlighting |
| History | 10,000 entries, shared across sessions, dedup |
Type aliases for the full list, or help-ndiag for diagnostic script docs.
| Alias | Description |
|---|---|
hostns |
Enter full host namespace (mount, net, pid) |
hostsh |
Host namespace with /bin/sh |
hostcmd <cmd> |
Run a single command in host namespace |
hroot / hlog / hpods |
Navigate to host filesystem locations |
hetc / hkube / hproc |
More host filesystem shortcuts |
| Alias | Description |
|---|---|
cps / cpsa |
List running / all containers |
cpods |
List pods |
clog <id> / clogf <id> |
Container logs / follow logs |
cinsp <id> |
Inspect container details |
cstats |
Container resource stats |
cexec <id> |
Exec into container |
| Alias | Description |
|---|---|
kn |
List cluster nodes |
kp |
List all pods (all namespaces) |
kpn |
List pods on current node only |
kevents |
Last 20 cluster events |
| Alias | Description |
|---|---|
ports / portsu |
TCP / UDP listeners |
conns |
Active TCP connections |
ifaces |
Network interfaces (brief) |
routes |
Routing table |
listen |
LISTEN sockets only |
pubip |
External IP |
tcpd / sniff |
tcpdump shortcuts |
| Alias | Description |
|---|---|
topmem / topcpu |
Top processes by memory / CPU |
psmem / pscpu |
ps sorted by memory / CPU |
loadavg |
Current load average |
memfree |
free -h |
iostats |
I/O statistics (5 samples) |
| Alias | Description |
|---|---|
dfu / dfi |
Disk usage / inode usage |
lsblk |
Block devices with model info |
smart <dev> |
SMART data for a device |
smartall |
Quick SMART health for all disks |
| Alias | Description |
|---|---|
cpuinfo |
lscpu |
meminfo |
DIMM inventory |
pcidev |
PCI devices (verbose) |
gpuinfo |
nvidia-smi (via nsenter) |
gputop |
GPU utilization monitor |
sensors |
Temperature readings |
| Alias | Description |
|---|---|
dmesg |
Kernel log (timestamped, colored) |
podlogs |
List pod log directories |
podlog <name> |
Tail logs for a pod (fuzzy match) |
Five hardware diagnostic scripts are included, each with subcommands and built-in help. Tab completion works for all subcommands.
All scripts support --raw / -r to output the raw underlying commands (no formatting or color). Useful for scripting or piping into other tools:
ndiag-disk --raw health # Full smartctl -a output for each disk
kdiag-node -r status # Raw K8s Node JSON
ndiag-mem --raw dimms # Raw dmidecode -t memory outputCPU diagnostics — frequency, load, top consumers, thermal throttling.
ndiag-cpu # Full report (same as 'all')
ndiag-cpu top # Top 15 CPU consumers
ndiag-cpu freq # Per-core frequency + governor
ndiag-cpu load # Load average + CPU time breakdown
ndiag-cpu throttle # Thermal zones + dmesg throttle events
ndiag-cpu --help # Full documentationMemory diagnostics — usage, DIMM inventory, swap, OOM detection.
ndiag-mem # Full report
ndiag-mem usage # Memory gauge + breakdown
ndiag-mem top # Top 15 RSS consumers
ndiag-mem dimms # Full DIMM detail (size, speed, manufacturer, part number, serial, rank, voltage)
ndiag-mem swap # Swap usage + top swap consumers
ndiag-mem oom # OOM kills + memory pressure (PSI)
ndiag-mem --help # Full documentationNetwork diagnostics — interfaces, connections, DNS, connectivity.
ndiag-net # Full report
ndiag-net ifaces # Interfaces + errors + speed
ndiag-net conns # Active TCP connections
ndiag-net listen # All listening ports
ndiag-net dns # DNS resolution tests
ndiag-net reach # Internet, K8s API, DNS, gateway checks
ndiag-net capture # Quick 50-packet tcpdump
ndiag-net --help # Full documentationDisk diagnostics — comprehensive SMART health (ATA + NVMe), I/O stats, usage, benchmarks. Supports USB-SATA bridges, detects HDD/SSD/NVMe, color-coded warnings for wearout, reallocated sectors, and temperature.
ndiag-disk # Full report (health + usage + io)
ndiag-disk health # SMART health with full attributes per disk
ndiag-disk io # Live I/O stats (1s sample)
ndiag-disk usage # Disk space + inode usage
ndiag-disk bench # Quick 256MB sequential R/W test
ndiag-disk --help # Full documentationPartition diagnostics — mounts, LVM, filesystems, partition tables.
ndiag-part # Full report
ndiag-part mounts # Active mounts (filtered)
ndiag-part lvm # LVM layout + software RAID
ndiag-part fs # Filesystem tree + UUIDs
ndiag-part table # GPT/MBR partition tables
ndiag-part --help # Full documentationSix kdiag-* scripts provide deep K8s visibility from inside the node. All use the pod's ServiceAccount token and (on control plane nodes) direct etcd client certs. All support --raw/-r to dump raw JSON.
RBAC requirement: The ServiceAccount needs
get/listonnodes,pods,services,endpoints, andevents.
| kdiag-node | kdiag-etcd (CP) |
|---|---|
![]() |
![]() |
| kdiag-certs | kdiag-services |
|---|---|
![]() |
![]() |
Node health — conditions, resource allocation, taints, pressure, kubelet.
kdiag-node # Full report
kdiag-node status # Conditions, version, OS, runtime
kdiag-node resources # Capacity vs allocatable vs pod requests (with bars)
kdiag-node taints # Taints, labels, annotations
kdiag-node pressure # PSI (CPU/memory/IO) + K8s pressure conditions
kdiag-node kubelet # Component processes, static pods, containerdPod diagnostics — sick detection, resource usage, images.
kdiag-pods # Full report
kdiag-pods list # All pods on this node
kdiag-pods sick # CrashLoopBackOff, OOMKilled, Pending, high restarts
kdiag-pods resources # Per-pod CPU/memory/GPU requests
kdiag-pods images # Container image inventory with counts
kdiag-pods logs # Pod log directories sorted by activityetcd deep dive (control plane nodes only).
kdiag-etcd # Full report
kdiag-etcd health # Health check, version, process
kdiag-etcd members # Member list + leader identification
kdiag-etcd size # DB size, fragmentation, quota gauge
kdiag-etcd alarms # Active alarms (NOSPACE, CORRUPT)
kdiag-etcd perf # Write/read latency + WAL fsync benchmark
kdiag-etcd keys # Key count by /registry/* prefixCertificate audit with color-coded expiry.
kdiag-certs # Full audit
kdiag-certs k8s # Kubernetes PKI certs (apiserver, kubelet, etc.)
kdiag-certs etcd # etcd certs (ca, server, peer, healthcheck)
kdiag-certs sa # ServiceAccount token decode + validity test
kdiag-certs tls # Live TLS check on apiserver, etcd, kubeletColor coding: green (>90d), yellow (30-90d), red (<30d).
Service & DNS debugging.
kdiag-services # Full report
kdiag-services list # All cluster services with type, IP, ports
kdiag-services dns # CoreDNS health + resolution tests
kdiag-services endpoints # Endpoint readiness per namespace
kdiag-services connectivity # Component matrix: apiserver, DNS, etcd, kubelet, internetSmart event viewer with grouping and live streaming.
kdiag-events # Events for this node (default)
kdiag-events node # Events involving this node
kdiag-events warnings # Cluster-wide warnings grouped by reason
kdiag-events all # All recent events (last 50)
kdiag-events ns gpu # Events in the "gpu" namespace
kdiag-events watch # Live event stream (Ctrl+C to stop)The container ships with 200+ tools organized by category:
| Category | Tools |
|---|---|
| Network | tcpdump, tshark, nmap, ncat, socat, iperf3, mtr, traceroute, ethtool, iptables, nftables, conntrack, curl, wget, dig |
| Process | htop, btop, strace, ltrace, lsof, sysstat (iostat, mpstat, pidstat, sar), iotop, dstat |
| Disk | smartmontools, hdparm, nvme-cli, fio, blktrace, fdisk, gdisk, parted, lvm2, mdadm |
| Hardware | dmidecode, lshw, lspci, lsusb, cpuid, numactl, hwinfo, efibootmgr, sensors |
| Stress | stress-ng, memtester |
| Container | crictl (container runtime interface) |
| Editors | vim (configured), nano |
| Terminal | tmux, screen, zsh, oh-my-zsh |
| Utilities | jq, tree, git, openssl, rsync, tar, gzip, xz, zstd |
| Environment Variable | Default | Description |
|---|---|---|
SSH_ENABLED |
false |
Enable/disable SSH server |
SSH_PORT |
2022 |
SSH listen port |
SSH_PASSWORD_AUTH |
false |
Allow password login |
SSH_AUTHORIZED_KEYS |
— | Newline-separated public keys |
env:
- name: SSH_AUTHORIZED_KEYS
valueFrom:
secretKeyRef:
name: ssh-keys
key: authorized_keysOr mount the secret as a directory containing an authorized_keys file
(the entrypoint reads /root/.ssh/authorized_keys_mount/authorized_keys,
so the secret must have a key named authorized_keys):
volumeMounts:
- name: ssh-keys
mountPath: /root/.ssh/authorized_keys_mount
readOnly: true
volumes:
- name: ssh-keys
secret:
secretName: ssh-authorized-keys
# secret data must be keyed as `authorized_keys`| Key | Action |
|---|---|
Space |
Leader key |
<leader>w |
Save |
<leader>q |
Quit |
<leader>fh |
Open /host/ |
<leader>fp |
Open /host-proc/ |
<leader>fl |
Open /host/var/log/ |
Ctrl+h/j/k/l |
Window navigation |
Alt+j/k |
Move lines up/down |
Esc Esc |
Clear search highlight |





