AI analysis of migration failures#1998
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndpoint, and tests Implements Phase 2 of the Analyse with AI feature: adds analyzer.py with extract_error_keywords, build_user_message, build_github_issue, parse_claude_response, query_rag, and analyze_migration; adds /analyze-migration POST endpoint to server.py; adds error_catalog.md with known vJailbreak error patterns; updates Dockerfile and docker-compose.yml for the vjailbreak-ai service. All 16 tests pass (13 analyzer + 3 server). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or context ConfigMap
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…8s Secret Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Issue fallback Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… tests (T051-T053)
…ote to deployment
Add graphify-out/ to .gitignore and untrack all 833 files. graphify regenerates these on every session, dirtying git status and polluting unrelated commits. Files remain on disk for local use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AI calls used /vpw/v1/... directly. In production, these bypassed the nginx ingress rewrite rule (/dev-api/sdk/(.*) → /$1) and fell through to the UI nginx location / block which requires Basic Auth, causing the htpasswd popup on GlobalSettings and AI Analysis pages. All other vpwned API calls (helpers.ts, vddk.ts, version.ts) already use the /dev-api/sdk/vpw/v1/... pattern correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pod logs Status.AgentName stores the node name (pod.Spec.NodeName), not the pod name. Spec.PodRef is the correct field holding the v2v-helper pod name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lhost vpwned pod has no nginx on localhost:80. Debug logs served by vjailbreak-ui-service.migration-system.svc.cluster.local/debug-logs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pydantic rejects null for dict[str, Any] fields. migration_plan,
migration_template, network_mapping, storage_mapping are nil when absent,
causing 422 from vjailbreak-ai. nilToEmptyMap converts nil → {} before
JSON serialization.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Go: log full payload at DEBUG level before sending to vjailbreak-ai - Go: log response body when vjailbreak-ai returns non-200 - Python: add RequestValidationError handler that logs request body + Pydantic errors at ERROR level Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l in JSON Go nil slice marshals to null; Pydantic rejects null for list[str]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…history - Track followUpLoading separately from loading so the result panel stays visible (no full-screen spinner) while a follow-up request is in flight; Send button shows inline spinner instead - Include initial user turn in history so the model has context for follow-up questions (was only storing assistant turn) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per-source log budget strategy: - v2v pod logs: raw tail 100k chars (failures always at end) - controller logs: error extraction context=5/tail=200, cap 50k chars - debug logs: error extraction context=3/no-tail, cap 50k chars, max 5 files Worst-case total ~100k tokens, well under the 200k API limit. Also adds deploy/vjailbreak-ai-context-configmap.yaml with default operator context covering vJailbreak architecture, common failure patterns, and analysis guidelines for more accurate AI responses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend: detect follow-up (question + non-empty history) and switch to a conversational system prompt instead of the JSON-forcing analysis prompt. Return is_followup:true so the frontend knows not to replace the structured result. Frontend: only call setResult() on initial analysis. Follow-up responses are appended to history only. Render history.slice(2) as a Q&A thread below the initial analysis so the conversation is visible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Haiku was returning JSON for follow-up questions because the conversation history stored the raw JSON response as the assistant's message, causing it to pattern-match the format. Two fixes: 1. Frontend: store human-readable text in history (root cause + fix steps + summary) instead of raw JSON for the initial analysis turn 2. Backend: strengthen FOLLOWUP_SYSTEM_PROMPT with explicit "NO JSON" instruction and clearer prose-only directive Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend: build_github_issue is now called for all confidence levels. should_open stays true only for none/low confidence (AI recommendation), but prefill_url is always present. Frontend: add a subtle "Still having issues? Open a GitHub Issue" link at the bottom of the structured analysis for high/medium confidence results, right-aligned in small secondary text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend may not have been rebuilt yet. Use prefill_url from response when available, otherwise fall back to a minimal prefilled title constructed from migrationName. Link now unconditionally visible. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vels Both the confidence=none section and the structured result section now use an outlined Button with endIcon instead of a plain text link, making the GitHub Issue action more discoverable and easier to click. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Makefile: - Add AI_IMG variable (quay.io/platform9/vjailbreak-ai) - Add `vjailbreak-ai` target: docker build vjailbreak-ai/ packer.yml: - Add AI_IMG env var and ai_img output to determine-release - Add build-vjailbreak-ai job (parallel to other builds) - Wire build-vjailbreak-ai into push-images needs/conditions - Wire build-vjailbreak-ai into post-build needs/conditions - Download and push vjailbreak-ai artifact in push-images - envsubst vjailbreak-ai manifest → image_builder/deploy/08vjailbreak-ai.yaml download_images.sh: - Pull and export vjailbreak-ai image tar for VM baking vjailbreak-ai/deploy/vjailbreak-ai.yaml: - Deployment + Service + PVC for vjailbreak-ai in migration-system - Reads ANTHROPIC_API_KEY + ADMIN_API_KEY from vjailbreak-ai-secret - Mounts /data PVC for ChromaDB and context.md persistence Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…et key names install.sh: after applying yamls (migration-system namespace exists), generate a random admin-key with openssl rand -hex 32 and store in vjailbreak-ai-secret. No manual step required. ANTHROPIC_API_KEY is left empty — user sets it via Settings UI (ai_key_handler stores it as api-key in the same secret). vjailbreak-ai.yaml: fix secretKeyRef key names to match what ai_key_handler.go writes: api-key (ANTHROPIC_API_KEY) and admin-key (ADMIN_API_KEY). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🚨 Security Vulnerability SummarySecurity posture degraded 📊 Overall Changes
🔍 Detailed Breakdown📦 Gosec (Static Analysis)
📦 Trivy (Dependency Scan)
📋 Baseline Methods
🚨 Added VulnerabilitiesTrivy (Dependencies) - 3 AddedTarget: Target: Target: Only HIGH and CRITICAL severity vulnerabilities are tracked |
| if not error_keywords or chroma_client is None: | ||
| return "" | ||
| try: | ||
| collection = chroma_client.get_collection("vjailbreak_docs") |
There was a problem hiding this comment.
🔴 ChromaDB collection name mismatch between server.py and analyzer.py causes RAG retrieval to always fail
The server creates a ChromaDB collection named "vjailbreak" at vjailbreak-ai/server.py:69, but analyzer.py:170 tries to retrieve from a collection named "vjailbreak_docs" via chroma_client.get_collection("vjailbreak_docs"). Since the collection vjailbreak_docs is never created, get_collection raises an exception, which is silently caught by the except Exception at line 180. This means the RAG pipeline — the key differentiator for providing relevant documentation context to the AI — silently produces empty results on every analysis call. The AI will never receive relevant documentation snippets (virt-v2v docs, troubleshooting guides, error catalog) during migration failure analysis.
| collection = chroma_client.get_collection("vjailbreak_docs") | |
| collection = chroma_client.get_collection("vjailbreak") |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if not ANTHROPIC_API_KEY: | ||
| raise RuntimeError("ANTHROPIC_API_KEY is not set") | ||
| if not ADMIN_API_KEY: | ||
| raise RuntimeError("ADMIN_API_KEY is not set — generate one with: python -c \"import secrets; print(secrets.token_hex(32))\"") |
There was a problem hiding this comment.
🔴 vjailbreak-ai server crashes on startup when API keys are not yet configured, contradicting optional: true deployment
The lifespan function in server.py:61-64 raises RuntimeError if ANTHROPIC_API_KEY or ADMIN_API_KEY environment variables are empty. However, the Kubernetes deployment manifests (vjailbreak-ai/deploy/vjailbreak-ai.yaml:54,60 and deploy/vjailbreak-ai/deployment.yaml:40,46) mark both secret key references as optional: true, which means the env vars will be empty strings when the secret key hasn't been set yet. The install.sh:222-226 only creates the secret with admin-key (no api-key). So immediately after installation, the pod reads an empty ANTHROPIC_API_KEY and crash-loops. The design doc explicitly states the pod should start without the Secret, but this code prevents that.
Prompt for agents
The lifespan function raises RuntimeError when ANTHROPIC_API_KEY or ADMIN_API_KEY are empty, but the K8s deployment uses optional: true on the secret references, meaning the pod should start without them. The install.sh script only creates the secret with admin-key, not api-key. The fix should allow the server to start gracefully without these keys, logging a warning instead of crashing. The /health endpoint should still work. Analysis endpoints should return a clear error (e.g. 503 with 'API key not configured') when ANTHROPIC_API_KEY is missing. Admin endpoints should return 401 when ADMIN_API_KEY is missing. This matches the design doc's intent: 'The Deployment uses optional: true so it starts without the Secret, but AI analysis calls will fail until the key is configured.'
Was this helpful? React with 👍 or 👎 to provide feedback.
| const handleAnalyse = useCallback(() => { | ||
| setResult(null) | ||
| setHistory([]) | ||
| runAnalysis() | ||
| }, [runAnalysis]) |
There was a problem hiding this comment.
🟡 Stale closure in handleAnalyse sends old conversation history when re-running analysis
handleAnalyse at AIAnalysisTab.tsx:98-102 calls setHistory([]) to reset history, then immediately calls runAnalysis(). However, runAnalysis (line 63) reads history from its closure, which still holds the previous value because React state updates are batched and not yet applied. On a re-analysis click, the old conversation history is sent to the backend instead of an empty array. This means the AI receives stale context from a prior analysis, potentially confusing the response.
Stale closure mechanism
setHistory([]) schedules a state update but doesn't change the history variable captured in runAnalysis's closure. Since runAnalysis depends on [migrationName, namespace, history], the version called by handleAnalyse still sees the old history.
Prompt for agents
In AIAnalysisTab.tsx, the handleAnalyse callback calls setHistory([]) then runAnalysis(), but runAnalysis captures the old history value from its closure. Fix this by either: (1) passing the empty history as a parameter to runAnalysis so it doesn't read from the stale closure, e.g. refactor runAnalysis to accept an optional historyOverride parameter and use it instead of the state variable, or (2) use a ref to track history so the latest value is always available. The simplest fix is approach (1): add a parameter like `historyToSend?: ConversationTurn[]` to runAnalysis, default to `history`, and in handleAnalyse pass `[]` explicitly.
Was this helpful? React with 👍 or 👎 to provide feedback.
| httpx==0.27.2 | ||
| python-multipart==0.0.12 | ||
| pydantic==2.9.2 | ||
| pytest==8.3.2 |
There was a problem hiding this comment.
🔴 Missing beautifulsoup4 dependency in requirements.txt causes /crawl endpoint to fail
crawler.py:10 imports from bs4 import BeautifulSoup, but beautifulsoup4 is not listed in vjailbreak-ai/requirements.txt. When the /crawl admin endpoint is called, it lazy-imports crawler.py (server.py:261), which will fail with ModuleNotFoundError: No module named 'bs4'. The Docker build installs only packages from requirements.txt, so the crawl functionality is completely broken in the container.
| pytest==8.3.2 | |
| pytest==8.3.2 | |
| beautifulsoup4==4.12.3 |
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it
Which issue(s) this PR fixes
(optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged)fixes #
Special notes for your reviewer
Testing done
Screen.Recording.2026-06-05.at.4.16.27.PM.mov