SSH Debug Shell

The Node Debug Dashboard includes an SSH server (port 2022) with zsh + oh-my-zsh, 200+ pre-installed diagnostic tools, and custom diagnostic scripts — turning every Kubernetes node into a full debugging workstation.

Connecting

# Password auth (default: debug/debug)
ssh -p 2022 debug@<node-ip>

# Root access
ssh -p 2022 root@<node-ip>

# With key auth
ssh -p 2022 -i ~/.ssh/id_ed25519 debug@<node-ip>

Both users have full sudo access. The debug user is recommended for daily use.

Shell Environment

Feature	Details
Shell	Zsh with oh-my-zsh (agnoster theme)
Editor	Vim with custom config (syntax, line numbers, status line)
Prompt	Shows `user@node-name` (Kubernetes node name)
Plugins	git, docker, kubectl, colored-man-pages, autosuggestions, syntax-highlighting
History	10,000 entries, shared across sessions, dedup

Quick Reference

Type aliases for the full list, or help-ndiag for diagnostic script docs.

Host Access

Alias	Description
`hostns`	Enter full host namespace (mount, net, pid)
`hostsh`	Host namespace with `/bin/sh`
`hostcmd <cmd>`	Run a single command in host namespace
`hroot` / `hlog` / `hpods`	Navigate to host filesystem locations
`hetc` / `hkube` / `hproc`	More host filesystem shortcuts

Containers (crictl)

Alias	Description
`cps` / `cpsa`	List running / all containers
`cpods`	List pods
`clog <id>` / `clogf <id>`	Container logs / follow logs
`cinsp <id>`	Inspect container details
`cstats`	Container resource stats
`cexec <id>`	Exec into container

Kubernetes (via ServiceAccount)

Alias	Description
`kn`	List cluster nodes
`kp`	List all pods (all namespaces)
`kpn`	List pods on current node only
`kevents`	Last 20 cluster events

Networking

Alias	Description
`ports` / `portsu`	TCP / UDP listeners
`conns`	Active TCP connections
`ifaces`	Network interfaces (brief)
`routes`	Routing table
`listen`	LISTEN sockets only
`pubip`	External IP
`tcpd` / `sniff`	tcpdump shortcuts

Processes & System

Alias	Description
`topmem` / `topcpu`	Top processes by memory / CPU
`psmem` / `pscpu`	ps sorted by memory / CPU
`loadavg`	Current load average
`memfree`	`free -h`
`iostats`	I/O statistics (5 samples)

Disk & Storage

Alias	Description
`dfu` / `dfi`	Disk usage / inode usage
`lsblk`	Block devices with model info
`smart <dev>`	SMART data for a device
`smartall`	Quick SMART health for all disks

Hardware

Alias	Description
`cpuinfo`	`lscpu`
`meminfo`	DIMM inventory
`pcidev`	PCI devices (verbose)
`gpuinfo`	`nvidia-smi` (via nsenter)
`gputop`	GPU utilization monitor
`sensors`	Temperature readings

Logs

Alias	Description
`dmesg`	Kernel log (timestamped, colored)
`podlogs`	List pod log directories
`podlog <name>`	Tail logs for a pod (fuzzy match)

Diagnostic Scripts

Five hardware diagnostic scripts are included, each with subcommands and built-in help. Tab completion works for all subcommands.

All scripts support --raw / -r to output the raw underlying commands (no formatting or color). Useful for scripting or piping into other tools:

ndiag-disk --raw health    # Full smartctl -a output for each disk
kdiag-node -r status       # Raw K8s Node JSON
ndiag-mem --raw dimms      # Raw dmidecode -t memory output

ndiag-cpu

CPU diagnostics — frequency, load, top consumers, thermal throttling.

ndiag-cpu              # Full report (same as 'all')
ndiag-cpu top          # Top 15 CPU consumers
ndiag-cpu freq         # Per-core frequency + governor
ndiag-cpu load         # Load average + CPU time breakdown
ndiag-cpu throttle     # Thermal zones + dmesg throttle events
ndiag-cpu --help       # Full documentation

ndiag-mem

Memory diagnostics — usage, DIMM inventory, swap, OOM detection.

ndiag-mem              # Full report
ndiag-mem usage        # Memory gauge + breakdown
ndiag-mem top          # Top 15 RSS consumers
ndiag-mem dimms        # Full DIMM detail (size, speed, manufacturer, part number, serial, rank, voltage)
ndiag-mem swap         # Swap usage + top swap consumers
ndiag-mem oom          # OOM kills + memory pressure (PSI)
ndiag-mem --help       # Full documentation

ndiag-net

Network diagnostics — interfaces, connections, DNS, connectivity.

ndiag-net              # Full report
ndiag-net ifaces       # Interfaces + errors + speed
ndiag-net conns        # Active TCP connections
ndiag-net listen       # All listening ports
ndiag-net dns          # DNS resolution tests
ndiag-net reach        # Internet, K8s API, DNS, gateway checks
ndiag-net capture      # Quick 50-packet tcpdump
ndiag-net --help       # Full documentation

ndiag-disk

Disk diagnostics — comprehensive SMART health (ATA + NVMe), I/O stats, usage, benchmarks. Supports USB-SATA bridges, detects HDD/SSD/NVMe, color-coded warnings for wearout, reallocated sectors, and temperature.

ndiag-disk             # Full report (health + usage + io)
ndiag-disk health      # SMART health with full attributes per disk
ndiag-disk io          # Live I/O stats (1s sample)
ndiag-disk usage       # Disk space + inode usage
ndiag-disk bench       # Quick 256MB sequential R/W test
ndiag-disk --help      # Full documentation

ndiag-part

Partition diagnostics — mounts, LVM, filesystems, partition tables.

ndiag-part             # Full report
ndiag-part mounts      # Active mounts (filtered)
ndiag-part lvm         # LVM layout + software RAID
ndiag-part fs          # Filesystem tree + UUIDs
ndiag-part table       # GPT/MBR partition tables
ndiag-part --help      # Full documentation

Kubernetes Diagnostic Scripts

Six kdiag-* scripts provide deep K8s visibility from inside the node. All use the pod's ServiceAccount token and (on control plane nodes) direct etcd client certs. All support --raw/-r to dump raw JSON.

RBAC requirement: The ServiceAccount needs get/list on nodes, pods, services, endpoints, and events.

kdiag-node	kdiag-etcd (CP)

kdiag-certs	kdiag-services

kdiag-node

Node health — conditions, resource allocation, taints, pressure, kubelet.

kdiag-node              # Full report
kdiag-node status       # Conditions, version, OS, runtime
kdiag-node resources    # Capacity vs allocatable vs pod requests (with bars)
kdiag-node taints       # Taints, labels, annotations
kdiag-node pressure     # PSI (CPU/memory/IO) + K8s pressure conditions
kdiag-node kubelet      # Component processes, static pods, containerd

kdiag-pods

Pod diagnostics — sick detection, resource usage, images.

kdiag-pods              # Full report
kdiag-pods list         # All pods on this node
kdiag-pods sick         # CrashLoopBackOff, OOMKilled, Pending, high restarts
kdiag-pods resources    # Per-pod CPU/memory/GPU requests
kdiag-pods images       # Container image inventory with counts
kdiag-pods logs         # Pod log directories sorted by activity

kdiag-etcd

etcd deep dive (control plane nodes only).

kdiag-etcd              # Full report
kdiag-etcd health       # Health check, version, process
kdiag-etcd members      # Member list + leader identification
kdiag-etcd size         # DB size, fragmentation, quota gauge
kdiag-etcd alarms       # Active alarms (NOSPACE, CORRUPT)
kdiag-etcd perf         # Write/read latency + WAL fsync benchmark
kdiag-etcd keys         # Key count by /registry/* prefix

kdiag-certs

Certificate audit with color-coded expiry.

kdiag-certs             # Full audit
kdiag-certs k8s         # Kubernetes PKI certs (apiserver, kubelet, etc.)
kdiag-certs etcd        # etcd certs (ca, server, peer, healthcheck)
kdiag-certs sa          # ServiceAccount token decode + validity test
kdiag-certs tls         # Live TLS check on apiserver, etcd, kubelet

Color coding: green (>90d), yellow (30-90d), red (<30d).

kdiag-services

Service & DNS debugging.

kdiag-services          # Full report
kdiag-services list     # All cluster services with type, IP, ports
kdiag-services dns      # CoreDNS health + resolution tests
kdiag-services endpoints  # Endpoint readiness per namespace
kdiag-services connectivity  # Component matrix: apiserver, DNS, etcd, kubelet, internet

kdiag-events

Smart event viewer with grouping and live streaming.

kdiag-events            # Events for this node (default)
kdiag-events node       # Events involving this node
kdiag-events warnings   # Cluster-wide warnings grouped by reason
kdiag-events all        # All recent events (last 50)
kdiag-events ns gpu     # Events in the "gpu" namespace
kdiag-events watch      # Live event stream (Ctrl+C to stop)

Pre-installed Tools

The container ships with 200+ tools organized by category:

Category	Tools
Network	tcpdump, tshark, nmap, ncat, socat, iperf3, mtr, traceroute, ethtool, iptables, nftables, conntrack, curl, wget, dig
Process	htop, btop, strace, ltrace, lsof, sysstat (iostat, mpstat, pidstat, sar), iotop, dstat
Disk	smartmontools, hdparm, nvme-cli, fio, blktrace, fdisk, gdisk, parted, lvm2, mdadm
Hardware	dmidecode, lshw, lspci, lsusb, cpuid, numactl, hwinfo, efibootmgr, sensors
Stress	stress-ng, memtester
Container	crictl (container runtime interface)
Editors	vim (configured), nano
Terminal	tmux, screen, zsh, oh-my-zsh
Utilities	jq, tree, git, openssl, rsync, tar, gzip, xz, zstd

Configuration

Environment Variable	Default	Description
`SSH_ENABLED`	`false`	Enable/disable SSH server
`SSH_PORT`	`2022`	SSH listen port
`SSH_PASSWORD_AUTH`	`false`	Allow password login
`SSH_AUTHORIZED_KEYS`	—	Newline-separated public keys

Key-based Auth via Kubernetes Secret

env:
  - name: SSH_AUTHORIZED_KEYS
    valueFrom:
      secretKeyRef:
        name: ssh-keys
        key: authorized_keys

Or mount the secret as a directory containing an authorized_keys file (the entrypoint reads /root/.ssh/authorized_keys_mount/authorized_keys, so the secret must have a key named authorized_keys):

volumeMounts:
  - name: ssh-keys
    mountPath: /root/.ssh/authorized_keys_mount
    readOnly: true
volumes:
  - name: ssh-keys
    secret:
      secretName: ssh-authorized-keys
      # secret data must be keyed as `authorized_keys`

Vim Keybindings

Key	Action
`Space`	Leader key
`<leader>w`	Save
`<leader>q`	Quit
`<leader>fh`	Open `/host/`
`<leader>fp`	Open `/host-proc/`
`<leader>fl`	Open `/host/var/log/`
`Ctrl+h/j/k/l`	Window navigation
`Alt+j/k`	Move lines up/down
`Esc Esc`	Clear search highlight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSH Debug Shell

Connecting

Shell Environment

Quick Reference

Host Access

Containers (crictl)

Kubernetes (via ServiceAccount)

Networking

Processes & System

Disk & Storage

Hardware

Logs

Diagnostic Scripts

ndiag-cpu

ndiag-mem

ndiag-net

ndiag-disk

ndiag-part

Kubernetes Diagnostic Scripts

kdiag-node

kdiag-pods

kdiag-etcd

kdiag-certs

kdiag-services

kdiag-events

Pre-installed Tools

Configuration

Key-based Auth via Kubernetes Secret

Vim Keybindings

FilesExpand file tree

ssh.md

Latest commit

History

ssh.md

File metadata and controls

SSH Debug Shell

Connecting

Shell Environment

Quick Reference

Host Access

Containers (crictl)

Kubernetes (via ServiceAccount)

Networking

Processes & System

Disk & Storage

Hardware

Logs

Diagnostic Scripts

ndiag-cpu

ndiag-mem

ndiag-net

ndiag-disk

ndiag-part

Kubernetes Diagnostic Scripts

kdiag-node

kdiag-pods

kdiag-etcd

kdiag-certs

kdiag-services

kdiag-events

Pre-installed Tools

Configuration

Key-based Auth via Kubernetes Secret

Vim Keybindings