Skip to content

AI analysis of migration failures#1998

Open
OmkarDeshpande7 wants to merge 35 commits into
mainfrom
hackathon/ai-analyse
Open

AI analysis of migration failures#1998
OmkarDeshpande7 wants to merge 35 commits into
mainfrom
hackathon/ai-analyse

Conversation

@OmkarDeshpande7

@OmkarDeshpande7 OmkarDeshpande7 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

What this PR does / why we need it

Which issue(s) this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged)

fixes #

Special notes for your reviewer

Testing done

Screen.Recording.2026-06-05.at.4.16.27.PM.mov

Open in Devin Review

OmkarDeshpande7 and others added 30 commits June 5, 2026 16:15
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndpoint, and tests

Implements Phase 2 of the Analyse with AI feature: adds analyzer.py with
extract_error_keywords, build_user_message, build_github_issue, parse_claude_response,
query_rag, and analyze_migration; adds /analyze-migration POST endpoint to server.py;
adds error_catalog.md with known vJailbreak error patterns; updates Dockerfile and
docker-compose.yml for the vjailbreak-ai service. All 16 tests pass (13 analyzer + 3 server).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…8s Secret

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Issue fallback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add graphify-out/ to .gitignore and untrack all 833 files.
graphify regenerates these on every session, dirtying git status and
polluting unrelated commits. Files remain on disk for local use.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AI calls used /vpw/v1/... directly. In production, these bypassed the
nginx ingress rewrite rule (/dev-api/sdk/(.*) → /$1) and fell through
to the UI nginx location / block which requires Basic Auth, causing
the htpasswd popup on GlobalSettings and AI Analysis pages.

All other vpwned API calls (helpers.ts, vddk.ts, version.ts) already
use the /dev-api/sdk/vpw/v1/... pattern correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pod logs

Status.AgentName stores the node name (pod.Spec.NodeName), not the pod
name. Spec.PodRef is the correct field holding the v2v-helper pod name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lhost

vpwned pod has no nginx on localhost:80. Debug logs served by
vjailbreak-ui-service.migration-system.svc.cluster.local/debug-logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pydantic rejects null for dict[str, Any] fields. migration_plan,
migration_template, network_mapping, storage_mapping are nil when absent,
causing 422 from vjailbreak-ai. nilToEmptyMap converts nil → {} before
JSON serialization.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Go: log full payload at DEBUG level before sending to vjailbreak-ai
- Go: log response body when vjailbreak-ai returns non-200
- Python: add RequestValidationError handler that logs request body + Pydantic errors at ERROR level

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l in JSON

Go nil slice marshals to null; Pydantic rejects null for list[str].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…history

- Track followUpLoading separately from loading so the result panel
  stays visible (no full-screen spinner) while a follow-up request
  is in flight; Send button shows inline spinner instead
- Include initial user turn in history so the model has context for
  follow-up questions (was only storing assistant turn)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Per-source log budget strategy:
- v2v pod logs: raw tail 100k chars (failures always at end)
- controller logs: error extraction context=5/tail=200, cap 50k chars
- debug logs: error extraction context=3/no-tail, cap 50k chars, max 5 files

Worst-case total ~100k tokens, well under the 200k API limit.

Also adds deploy/vjailbreak-ai-context-configmap.yaml with default
operator context covering vJailbreak architecture, common failure
patterns, and analysis guidelines for more accurate AI responses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend: detect follow-up (question + non-empty history) and switch to
a conversational system prompt instead of the JSON-forcing analysis
prompt. Return is_followup:true so the frontend knows not to replace
the structured result.

Frontend: only call setResult() on initial analysis. Follow-up
responses are appended to history only. Render history.slice(2) as a
Q&A thread below the initial analysis so the conversation is visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Haiku was returning JSON for follow-up questions because the
conversation history stored the raw JSON response as the assistant's
message, causing it to pattern-match the format.

Two fixes:
1. Frontend: store human-readable text in history (root cause + fix
   steps + summary) instead of raw JSON for the initial analysis turn
2. Backend: strengthen FOLLOWUP_SYSTEM_PROMPT with explicit "NO JSON"
   instruction and clearer prose-only directive

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OmkarDeshpande7 and others added 5 commits June 5, 2026 16:15
Backend: build_github_issue is now called for all confidence levels.
should_open stays true only for none/low confidence (AI recommendation),
but prefill_url is always present.

Frontend: add a subtle "Still having issues? Open a GitHub Issue"
link at the bottom of the structured analysis for high/medium confidence
results, right-aligned in small secondary text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend may not have been rebuilt yet. Use prefill_url from response
when available, otherwise fall back to a minimal prefilled title
constructed from migrationName. Link now unconditionally visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vels

Both the confidence=none section and the structured result section
now use an outlined Button with endIcon instead of a plain text link,
making the GitHub Issue action more discoverable and easier to click.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Makefile:
- Add AI_IMG variable (quay.io/platform9/vjailbreak-ai)
- Add `vjailbreak-ai` target: docker build vjailbreak-ai/

packer.yml:
- Add AI_IMG env var and ai_img output to determine-release
- Add build-vjailbreak-ai job (parallel to other builds)
- Wire build-vjailbreak-ai into push-images needs/conditions
- Wire build-vjailbreak-ai into post-build needs/conditions
- Download and push vjailbreak-ai artifact in push-images
- envsubst vjailbreak-ai manifest → image_builder/deploy/08vjailbreak-ai.yaml

download_images.sh:
- Pull and export vjailbreak-ai image tar for VM baking

vjailbreak-ai/deploy/vjailbreak-ai.yaml:
- Deployment + Service + PVC for vjailbreak-ai in migration-system
- Reads ANTHROPIC_API_KEY + ADMIN_API_KEY from vjailbreak-ai-secret
- Mounts /data PVC for ChromaDB and context.md persistence

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…et key names

install.sh: after applying yamls (migration-system namespace exists),
generate a random admin-key with openssl rand -hex 32 and store in
vjailbreak-ai-secret. No manual step required. ANTHROPIC_API_KEY is
left empty — user sets it via Settings UI (ai_key_handler stores it
as api-key in the same secret).

vjailbreak-ai.yaml: fix secretKeyRef key names to match what
ai_key_handler.go writes: api-key (ANTHROPIC_API_KEY) and
admin-key (ADMIN_API_KEY).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🚨 Security Vulnerability Summary

Security posture degraded

📊 Overall Changes

Metric Count
Total Added 3
Total Fixed 0
Net Change +3

🔍 Detailed Breakdown

📦 Gosec (Static Analysis)

Current Baseline Added Fixed Method
0 0 0 0 artifact

📦 Trivy (Dependency Scan)

Current Baseline Added Fixed Method
32 29 3 0 artifact

📋 Baseline Methods

  • 📦 artifact: Used stored report from main branch
  • 🔄 live_scan: Scanned base branch in real-time
  • ⚠️ no_baseline: No baseline available (all vulnerabilities treated as new)

🚨 Added Vulnerabilities

Trivy (Dependencies) - 3 Added

Target: vjailbreak-ai/requirements.txt
Package: python-multipart 0.0.12
Vulnerability: CVE-2024-53981
Severity: HIGH
Title: python-multipart: python-multipart has a DoS via deformation multipart/form-data boundary

Target: vjailbreak-ai/requirements.txt
Package: python-multipart 0.0.12
Vulnerability: CVE-2026-24486
Severity: HIGH
Title: python-multipart: Python-Multipart: Arbitrary file write via path traversal vulnerability

Target: vjailbreak-ai/requirements.txt
Package: python-multipart 0.0.12
Vulnerability: CVE-2026-42561
Severity: HIGH
Title: Python-Multipart is a streaming multipart parser for Python. Prior to ...


Only HIGH and CRITICAL severity vulnerabilities are tracked
Baseline: e10fa8b19e5f84f5ec23fbcf2a8382df95f8f142

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment thread vjailbreak-ai/analyzer.py
if not error_keywords or chroma_client is None:
return ""
try:
collection = chroma_client.get_collection("vjailbreak_docs")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 ChromaDB collection name mismatch between server.py and analyzer.py causes RAG retrieval to always fail

The server creates a ChromaDB collection named "vjailbreak" at vjailbreak-ai/server.py:69, but analyzer.py:170 tries to retrieve from a collection named "vjailbreak_docs" via chroma_client.get_collection("vjailbreak_docs"). Since the collection vjailbreak_docs is never created, get_collection raises an exception, which is silently caught by the except Exception at line 180. This means the RAG pipeline — the key differentiator for providing relevant documentation context to the AI — silently produces empty results on every analysis call. The AI will never receive relevant documentation snippets (virt-v2v docs, troubleshooting guides, error catalog) during migration failure analysis.

Suggested change
collection = chroma_client.get_collection("vjailbreak_docs")
collection = chroma_client.get_collection("vjailbreak")
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread vjailbreak-ai/server.py
Comment on lines +61 to +64
if not ANTHROPIC_API_KEY:
raise RuntimeError("ANTHROPIC_API_KEY is not set")
if not ADMIN_API_KEY:
raise RuntimeError("ADMIN_API_KEY is not set — generate one with: python -c \"import secrets; print(secrets.token_hex(32))\"")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 vjailbreak-ai server crashes on startup when API keys are not yet configured, contradicting optional: true deployment

The lifespan function in server.py:61-64 raises RuntimeError if ANTHROPIC_API_KEY or ADMIN_API_KEY environment variables are empty. However, the Kubernetes deployment manifests (vjailbreak-ai/deploy/vjailbreak-ai.yaml:54,60 and deploy/vjailbreak-ai/deployment.yaml:40,46) mark both secret key references as optional: true, which means the env vars will be empty strings when the secret key hasn't been set yet. The install.sh:222-226 only creates the secret with admin-key (no api-key). So immediately after installation, the pod reads an empty ANTHROPIC_API_KEY and crash-loops. The design doc explicitly states the pod should start without the Secret, but this code prevents that.

Prompt for agents
The lifespan function raises RuntimeError when ANTHROPIC_API_KEY or ADMIN_API_KEY are empty, but the K8s deployment uses optional: true on the secret references, meaning the pod should start without them. The install.sh script only creates the secret with admin-key, not api-key. The fix should allow the server to start gracefully without these keys, logging a warning instead of crashing. The /health endpoint should still work. Analysis endpoints should return a clear error (e.g. 503 with 'API key not configured') when ANTHROPIC_API_KEY is missing. Admin endpoints should return 401 when ADMIN_API_KEY is missing. This matches the design doc's intent: 'The Deployment uses optional: true so it starts without the Secret, but AI analysis calls will fail until the key is configured.'
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +98 to +102
const handleAnalyse = useCallback(() => {
setResult(null)
setHistory([])
runAnalysis()
}, [runAnalysis])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Stale closure in handleAnalyse sends old conversation history when re-running analysis

handleAnalyse at AIAnalysisTab.tsx:98-102 calls setHistory([]) to reset history, then immediately calls runAnalysis(). However, runAnalysis (line 63) reads history from its closure, which still holds the previous value because React state updates are batched and not yet applied. On a re-analysis click, the old conversation history is sent to the backend instead of an empty array. This means the AI receives stale context from a prior analysis, potentially confusing the response.

Stale closure mechanism

setHistory([]) schedules a state update but doesn't change the history variable captured in runAnalysis's closure. Since runAnalysis depends on [migrationName, namespace, history], the version called by handleAnalyse still sees the old history.

Prompt for agents
In AIAnalysisTab.tsx, the handleAnalyse callback calls setHistory([]) then runAnalysis(), but runAnalysis captures the old history value from its closure. Fix this by either: (1) passing the empty history as a parameter to runAnalysis so it doesn't read from the stale closure, e.g. refactor runAnalysis to accept an optional historyOverride parameter and use it instead of the state variable, or (2) use a ref to track history so the latest value is always available. The simplest fix is approach (1): add a parameter like `historyToSend?: ConversationTurn[]` to runAnalysis, default to `history`, and in handleAnalyse pass `[]` explicitly.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

httpx==0.27.2
python-multipart==0.0.12
pydantic==2.9.2
pytest==8.3.2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Missing beautifulsoup4 dependency in requirements.txt causes /crawl endpoint to fail

crawler.py:10 imports from bs4 import BeautifulSoup, but beautifulsoup4 is not listed in vjailbreak-ai/requirements.txt. When the /crawl admin endpoint is called, it lazy-imports crawler.py (server.py:261), which will fail with ModuleNotFoundError: No module named 'bs4'. The Docker build installs only packages from requirements.txt, so the crawl functionality is completely broken in the container.

Suggested change
pytest==8.3.2
pytest==8.3.2
beautifulsoup4==4.12.3
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant