Contains tools and scripts for collecting data and diagnosing issues in Submariner deployments
Collect diagnostics once from live clusters, then analyze offline anytime - no cluster access needed for analysis.
- Smart Automated Collection: Intelligently gathers diagnostic data from both clusters
- Basic Analysis (Python): Fast pattern-matching for common issues - no AI required
- Advanced Analysis (Claude Code): Deep AI-powered root cause analysis
git clone https://github.com/submariner-io/submariner-diagnostics.git
cd submariner-diagnostics# Required for collection
kubectl version --client
subctl version
# Required for basic analysis (optional)
python3 --version
pip install pyyamlkubectl- Installation Guidesubctl- Installation Guidepython3andpyyaml- Only needed for basic analysis (optional)- Access to both Submariner clusters with valid kubeconfig files
./collect-full-diagnostics.sh <cluster1-context> <cluster1-kubeconfig> <cluster2-context> <cluster2-kubeconfig> [issue-description]cluster1-context: Context name for cluster 1 (from kubeconfig)cluster1-kubeconfig: Path to kubeconfig file for cluster 1cluster2-context: Context name for cluster 2 (from kubeconfig)cluster2-kubeconfig: Path to kubeconfig file for cluster 2issue-description: Optional description of the issue
# Separate kubeconfig files
./collect-full-diagnostics.sh \
prod-east /path/to/kubeconfig-east \
prod-west /path/to/kubeconfig-west \
"tunnel not connected"
# Single kubeconfig with multiple contexts
./collect-full-diagnostics.sh \
context-cluster1 /path/to/merged-kubeconfig \
context-cluster2 /path/to/merged-kubeconfig \
"connectivity issues"
# Without issue description (defaults to "undefined")
./collect-full-diagnostics.sh \
prod-east /path/to/kubeconfig-east \
prod-west /path/to/kubeconfig-westOutput: submariner-diagnostics-TIMESTAMP.tar.gz
./analyze-basic.py submariner-diagnostics-TIMESTAMP.tar.gz- Version compatibility issues (subctl vs Submariner)
- Submariner software bugs with automatic GitHub search
- Detects known bugs (e.g., libreswan version incompatibility)
- Searches GitHub for existing fixes or workarounds
- Shows PR merge dates and issue status
- Prevents duplicate bug reports
- Tunnel connectivity status
- ESP/UDP protocol blocking
- Firewall blocking (inter-cluster and intra-cluster)
- MTU/fragmentation issues
- Pod health issues
- Packet flow patterns (from tcpdump)
- RouteAgent connectivity with gateway correlation
- Distinguishes intra-cluster vs inter-cluster issues
- Detects control plane connectivity patterns
- Identifies root cause segment (local routing vs inter-cluster)
- Network topology analysis (when RouteAgent errors detected)
- Detects potential non-flat networking scenarios
- Analyzes node subnet distribution
- Common misconfigurations
No setup required - just Python 3 with PyYAML:
pip install pyyamlFor deeper analysis with Claude AI:
# Install the Claude Code skill (creates /submariner:analyze-offline command)
mkdir -p ~/.claude/commands/submariner
cp analyze-offline.md ~/.claude/commands/submariner/analyze-offline.mdRestart Claude Code, then verify the command is available:
/submariner:analyze-offline
Note: The command will appear as /submariner:analyze-offline in Claude Code.
How it works: The skill uses modular analysis guides from docs/analysis/ in this repository. These are
automatically accessible when the skill runs - no need to copy them separately.
/submariner:analyze-offline submariner-diagnostics-TIMESTAMP.tar.gz
# Or with specific issue description
/submariner:analyze-offline submariner-diagnostics-TIMESTAMP.tar.gz "tunnel not connected"
- Claude Code installed
- Claude subscription
- MTU/fragmentation issues (classic pattern: small packets pass, large packets fail)
- Submariner software bugs with GitHub search
- Automatically searches for known issues and fixes
- Provides upgrade recommendations when fix is available
- Links to relevant PRs and issues
- Infrastructure-level blocking patterns from tcpdump analysis
- All issues detected by basic analysis
- Deep root cause analysis with context
- Probabilistic reasoning ("most likely", "appears to be")
- Step-by-step solutions with deployment-specific commands (ACM vs Standalone)
- Official documentation references
- Further investigation steps if initial solution fails
- Guidance on contacting Submariner experts for software bugs
collection.log- Complete collection output including any errorssubctl gather- Comprehensive cluster data including:- Submariner CRs (Gateway, Endpoints, RouteAgents)
- Pod logs and status
- IPsec status and traffic counters
- Network configuration (routes, iptables, XFRM policies)
subctl show- Connection status overviewsubctl show versions- Component version informationsubctl diagnose- Health check results- Gateway and RouteAgent status
- ACM resources (if present)
- Version compatibility check (in manifest.txt)
tcpdump packet captures from gateway nodes (80-second capture)
- Automatically captured if either tunnel shows
status != connected - Helps diagnose infrastructure-level blocking (ESP/UDP)
- Includes text analysis summaries for offline review
- Benefit: Identifies where packets are being dropped
Firewall inter-cluster diagnostics (subctl diagnose firewall inter-cluster)
- Runs when: Tunnel not connected + UDP encapsulation (VxLAN or IPSec NAT-T)
- Skipped when: Using ESP (protocol 50) - test only checks UDP ports
- Benefit: Verifies if UDP ports are open between gateway nodes
- Cross-referenced with: tcpdump data (UDP traffic) + IPsec counters (ipsec-trafficstatus.log)
- Default packet size tests
- Small packet size tests (MTU detection)
- Service discovery tests (if enabled)
- Benefit: Validates end-to-end connectivity
Firewall intra-cluster diagnostics (subctl diagnose firewall intra-cluster)
- Runs per cluster when CNI is not OVNK (checked independently)
- Runs regardless of tunnel status
- Benefit: Verifies VXLAN traffic allowed on vx-submariner interface
- Expected failures: RouteAgent issues + verify test failures from non-gateway pods
The analysis logic is organized into modular, focused guides:
CLAUDE.md- Repository overview and analysis principlesanalyze-offline.md- Claude Code skill entry point (install this)docs/analysis/- Modular analysis guides:datapath-architecture.md- Submariner datapath fundamentals (non-OVN vs OVN)tunnel-analysis.md- Tunnel connectivity and IPsec datapathasymmetric-tunnel-analysis.md- Asymmetric tunnel investigationfirewall-analysis.md- Network/firewall blocking (tcpdump)mtu-analysis.md- MTU and fragmentation issuesgateway-ha-analysis.md- Gateway HA status checksrouteagent-analysis.md- RouteAgent and OVN-specific checksdeployment-detection.md- ACM vs Standalone detectionreport-format.md- Analysis report templatesspecial-cases.md- Edge cases and special scenarios
This modular structure makes the codebase easier to maintain while keeping the user experience simple (/submariner:analyze-offline).
bashkubectlsubctl- Installation Guide- Access to both Submariner clusters
Note: Packet captures are performed inside the cluster using containers - no
local tcpdump installation required.
python3(3.6+)pyyaml- Install:pip install pyyaml
- Claude Code
- Claude subscription
For airgap environments, ensure the following container image is mirrored to your internal registry:
quay.io/submariner/nettest:devel- Used by several components in the collection script (firewall diagnostics, tcpdump collection, connectivity verification)
Contributions welcome! Please submit issues or PRs to: https://github.com/submariner-io/submariner-diagnostics
- Collection/Analysis Issues: GitHub Issues
- Submariner Bugs: Submariner GitHub
- Community Help: Submariner Slack
Apache 2.0