Skip to content

feat(ai): implement predictive friction clusters for zero-day incident detection#87

Open
Maayank18 wants to merge 2 commits into
vicharanashala:mainfrom
Maayank18:main
Open

feat(ai): implement predictive friction clusters for zero-day incident detection#87
Maayank18 wants to merge 2 commits into
vicharanashala:mainfrom
Maayank18:main

Conversation

@Maayank18

Copy link
Copy Markdown

🚀 What is this PR?

Transforms the FAQ platform from a reactive system into a proactive one.
This PR introduces Predictive Friction Clusters—a background daemon that detects sudden spikes in semantically identical, unfulfilled search queries (e.g., a system outage). Once a volume threshold is met, it leverages an LLM to generate an incident warning and deploys a global banner, deflecting duplicate support questions before they are ever asked.

🧠 Architectural Changes

  • Data Layer: Upgraded SearchLog schema to asynchronously capture and store 1024-dim query embeddings without blocking the user's search request path.
  • Clustering Engine: Added frictionClusterer.ts, running a DBSCAN algorithm (epsilon: 0.65) on a 10-minute cron interval over a sliding 30-minute window to detect dense query vectors.
  • LLM Integration: Automatically prompts Anthropic's Claude to interpret the root intent of the clustered queries and generate a clean JSON incident report (hardened with robust Regex parsing to bypass LLM conversational hallucinations).
  • Frontend UI:
    • GlobalAlertBanner: A persistent, polling layout component that dynamically color-codes severity (Info, Warning, Critical) to deflect users.
    • AdminAlerts: A new management dashboard at /admin/alerts allowing support teams to view the triggering queries and manually resolve the incident.

🛡️ Safety & Reliability Audit

  • Non-Blocking Telemetry: Vector embeddings are piped to MongoDB via a batched asynchronous queue (BATCH_MAX_SIZE=50), guaranteeing 0ms latency impact on the core search API.
  • Cron Resilience: The clustering loop implements isolated try/catch execution. A single malformed LLM response or DB timeout will gracefully log and skip, leaving the rest of the daemon unblocked.
  • UI Resilience: Components strictly validate API payloads (res.data?.alerts || []), preventing React rendering crashes in the event of upstream proxy errors.

Author: Mayank Garg gargmayank1805@gmail.com

…t detection

This introduces an autonomous incident detection pipeline that proactively deflects support tickets by identifying emerging search trends in real-time.
- Add embedding vectors to SearchLog for semantic grouping.
- Implement DBSCAN clustering (frictionClusterer.ts) over a 30-min rolling window.
- Leverage Anthropic LLM to auto-generate incident titles and descriptions.
- Add GlobalAlert model and /api/alerts API endpoints.
- Create dynamic GlobalAlertBanner layout component for immediate user deflection.
- Create /admin/alerts dashboard for visibility and incident resolution.
- Enhance async error handling to prevent cron loop failures or UI crashes.
Author: Mayank Garg <gargmayank1805@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant