A fraud-intelligence platform for job seekers, built on Microsoft Foundry.
Most modern job scams don't look like scams: real company names, professional emails, polished offer letters. The fraud hides in the relationships between pieces of evidence — a "Google recruiter" whose domain was registered last month, a Reply-To that routes to a free mailbox, a USDT wallet that six prior victims reported under six different brand names.
So instead of asking "does this email look suspicious?", Verify My Interview asks "what can we prove about this recruiter, company, domain, phone number, and payment trail?" — and shows the proof.
- Evidence Collection — paste an email (raw headers understood: Reply-To mismatch, sender IP, SPF/DKIM/DMARC), upload a screenshot or PDF (Azure AI Document Intelligence OCR), drop a URL, or tell us what happened by voice (Azure AI Speech transcription) — for scams that ran over WhatsApp calls or voice notes, where there's no email to forward.
- Investigation Engine — six specialist Foundry agents collaborate:
Evidence → Verification → Research → Network → Critic → Report. Every
finding is
claim + evidence + confidence + source; the Critic strikes anything no tool result proves. - Similar-Report Intelligence — de-identified reports live in Azure AI Search and a backend linked-evidence model over hard identifiers (domains, emails, phones, payment handles — never names). The UI keeps this simple: users see plain "Similar reports" cards only when a prior report is useful.
- Workspace + History — the product is one ChatGPT-style investigation workspace, plus simple browser history and signed-in redacted case snapshots so users can reopen past checks without understanding internal graph machinery.
flowchart TB
subgraph Intake["Evidence Collection"]
A[Email / raw headers] --> P
B[Screenshot / PDF<br/>Azure Document Intelligence OCR] --> P
C[URL / text] --> P
VN[Voice note<br/>Azure AI Speech transcription] --> P
end
P[Evidence Agent<br/>entity + header extraction]
subgraph Engine["Investigation Engine — Microsoft Foundry agents"]
P --> V[Verification Agent<br/>registry · RDAP · DNS tools]
V --> R[Research Agent<br/>web/OSINT, cited]
R --> N[Network Agent<br/>vector + graph match]
N --> K[Critic Agent<br/>removes unproven claims]
K --> S[Deterministic Scorer<br/>signals → 0-100]
S --> W[Report Agent<br/>narrative + FTC/FBI/BBB citations]
end
subgraph Intel["Similar-Report Intelligence"]
X[(Azure AI Search<br/>vector index)] <--> N
G[Linked evidence<br/>domains · phones · wallets · trust levels] <--> N
end
W --> UI[Investigation Workspace<br/>stacked cards · report ack · history]
UI <--> D[Conversational Detective<br/>case-aware follow-up]
UI <--> H[(Cosmos DB<br/>accounts · redacted cases · reports)]
UI <--> B[(Private Blob Storage<br/>consented evidence files)]
Details: docs/ARCHITECTURE.md
- AI-first intake. Users paste the whole situation, attach files, or add voice in one place. There is no separate verify page, report page, network page, or dashboard to understand before getting help.
- Plain similar-report evidence. The backend still compares hard identifiers against prior reports, but users see only understandable match cards. No public graph UI is required to benefit from the intelligence layer.
- Agents never set the score. Reasoning plans the investigation; a transparent deterministic scorer sums evidence-backed signals into the 0–100 risk score. Every point is traceable.
- Cited guidance. Verdicts attach the FTC / FBI IC3 / BBB guidance that matches the signals the case actually triggered, with real source URLs.
- Always demoable. Every agent has a deterministic fallback. Unset the
Azure env vars and the same case still completes — the trace just says
deterministicinstead offoundry.
You can view the Verify My Interview project demo video here: https://youtu.be/9OACXRezCKc. The live deployed application is available at https://vmi-online-3907.azurewebsites.net, and the backend API is available at https://vmi-api-3907.azurewebsites.net. To test the project, first watch the demo video to understand the problem, solution, and workflow. Then open the live app link in your browser, create or submit a verification case, and review the generated scam-risk report. The API link is provided for checking backend availability, health, and integration with the live frontend.
npm run eval runs every scenario in tests/test_cases/ through the full
pipeline in reproducible offline mode (external keys scrubbed) and asserts
risk-level band, score range, required/forbidden signals, and similar-report
expectations. npm test gates the same suite in Jest. Current run (13/13):
| Case | Level | Score | Result |
|---|---|---|---|
| Header-spoofed corporate email (SPF/DMARC fail) | Likely Scam | 77 | PASS |
| Inconclusive - insufficient evidence | Inconclusive | 0 | PASS |
| Legitimate job (Microsoft) | Low Risk | 0 | PASS |
| Obvious scam (Google impersonation) | Likely Scam | 100 | PASS |
| Ring-linked offer (shared scam infrastructure) | Likely Scam | 100 | PASS |
| SA brand-impersonation via job aggregator | Needs More Verification | 12 | PASS |
| SA document-harvest via free-host link | Suspicious | 47 | PASS |
| Legitimate SA youth learnership (control) | Low Risk | 0 | PASS |
| Legitimate recruiter on an unusual TLD (FP trap) | Low Risk | 2 | PASS |
| SA SMS reply-bait smishing | Needs More Verification | 18 | PASS |
| SA upfront-fee + WhatsApp-only retail scam | Suspicious | 50 | PASS |
| Suspicious - mixed signals | Suspicious | 65 | PASS |
| Voice report training-fee narrative | Likely Scam | 70 | PASS |
The South African cases are synthetic scenarios modelled on real scam
patterns (aggregator/free-host application channels, rand-denominated "induction
fees", ID/SARS/banking-proof harvesting). The two legitimate controls — a real
recruiter on an unusual .us TLD and a normal learnership — must not be
flagged: falsely flagging a real recruiter is a defamation risk.
These same evals caught real bugs during development — a brand-word entity-resolution false positive ("gift card: Microsoft" linking every email that mentioned Microsoft) and a zero-evidence case being nudged into reassuring "Low Risk".
npm install
npm run build # backend tsc + frontend vite -> public/
npm start # http://localhost:3000Or for development: npm run dev (API) + npm run dev:web (Vite).
Try a case:
curl -X POST http://localhost:3000/analyze \
-H "Content-Type: application/json" \
-d '{"evidence":"From: d.okafor@nimbus-talent-hr.com\nReply-To: nimbus.onboarding@gmail.com\nSubject: Final onboarding - QA Analyst at Google\n\nA refundable compliance deposit of $200 is required, payable in USDT to wallet TQrKp4mNbu77 or Zelle: nimbus-onboard. Reach us on WhatsApp +1 (332) 555-0144."}'Without Azure configured the pipeline runs deterministically. The response still includes the verdict, six-stage trace, signals, guidance, and similar-report matches when the seeded corpus contains related evidence.
Auth is Microsoft Entra ID (DefaultAzureCredential) — no API keys in code.
- Create a Foundry project and deploy a model (e.g.
gpt-4o). az login- Configure
.env(see .env.example for every subsystem):
AZURE_AI_PROJECT_ENDPOINT=https://<resource>.services.ai.azure.com/api/projects/<project>
AZURE_AI_MODEL_DEPLOYMENT=gpt-4o
# optional extras
AZURE_SEARCH_ENDPOINT=... # similar-report matching
AZURE_SEARCH_API_KEY=...
AZURE_DOCINT_ENDPOINT=... # OCR uploads
AZURE_DOCINT_KEY=...
AZURE_SPEECH_REGION=... # voice transcription ("Tell Us What Happened")
AZURE_SPEECH_KEY=...
SERPAPI_API_KEY=... # Research agent web/OSINT- Seed the similar-report index:
npm run seed:network
Deployment to Azure Container Apps is covered by the
.agents/skills/deploy-azure-foundry skill.
The app stays anonymous when auth is unconfigured. When Microsoft Entra External ID is configured, the browser uses an env-gated PKCE flow:
AUTH_ISSUER=https://<tenant>.ciamlogin.com/<tenantId>/v2.0
AUTH_AUDIENCE=<api-application-client-id>
VITE_AUTH_CLIENT_ID=<spa-application-client-id>
VITE_AUTH_AUTHORITY=https://<tenant>.ciamlogin.com/<tenantId>
VITE_AUTH_SCOPE=api://<api-application-client-id>/access_as_userSigned-in users get account history from /cases. Original evidence files are
stored only after the user enables evidence retention, then uploaded through
private /evidence and linked to the case by evidenceId. DELETE /me erases
the account, cases, usage, and stored evidence; de-identified community reports
remain for fraud prevention.
| Endpoint | Purpose |
|---|---|
POST /analyze |
Investigate evidence → report + trace + signals + similar reports |
POST /chat |
Case-aware detective follow-up |
POST /transcribe |
Transcribe a voice note (Azure AI Speech) for investigation |
POST /upload |
OCR a screenshot/PDF via Document Intelligence |
POST /report |
Submit a de-identified scam report |
GET /reports/pending |
Admin only: pending public reports for moderation |
POST /reports/:id/moderate |
Admin only: approve/reject a pending report |
POST /share |
Save a finished report result for sharing |
GET /shared/:id |
Load a shared report result |
GET /me / DELETE /me |
Signed-in profile/usage and POPIA erasure |
PUT /me/consent |
Set evidence-storage consent |
GET /cases / GET /cases/:id |
Signed-in redacted case history |
POST /evidence / GET /evidence/:fileId |
Consented private evidence storage/readback |
GET /health |
Per-subsystem status flags |
GET /docs |
API documentation |
Internal operations endpoints also exist for the linked-evidence corpus and
signed-in account/evidence flows; see /docs on a running server.
This is a risk assessment tool, not an accusation engine: it reports evidence-backed risk with confidence and sources, and prefers "needs more verification" over false alarms. Seeded report data in this repo is synthetic demo data. Evidence is treated as untrusted input.
Privacy (POPIA). Sensitive identifiers — South African ID numbers, bank
accounts, payment cards — are stripped from evidence before anything is logged
or stored, while scam indicators (domains, emails, phones) are preserved as the
investigative evidence they are. Every channel — typed, OCR'd, or
voice-transcribed — passes the same redaction boundary. Raw pasted text and raw
audio are not retained by default; original files are stored only for signed-in
users who explicitly enable evidence retention. The SPA exposes /privacy and
/terms for in-product notice and use terms. See docs/PRIVACY.md for the
full POPIA posture (lawful basis, minimization, retention, special personal
information, data-subject rights) and docs/PRODUCTION_READINESS.md
for the grounding, safety, evaluation, and hardening roadmap.
src/backend/agent/ orchestrator + the six specialist agents (Foundry runner + fallbacks)
src/backend/tools/ verification tool adapters (registry, RDAP/DNS, patterns, web research)
src/backend/network/ AI Search corpus, entity graph, trust levels, seed data
src/backend/scorer/ signal engine + deterministic scorer
src/backend/knowledge/ FTC/FBI/BBB guidance matching
src/backend/privacy/ PII redaction & data minimization (POPIA)
src/backend/ocr/ Azure Document Intelligence
src/backend/scripts/ seedNetwork, runEvals, smoke
frontend/ React + Vite + Tailwind (Sentinel UI)
tests/test_cases/ eval scenarios (run: npm run eval)
docs/ ARCHITECTURE, PRIVACY, PRODUCTION_READINESS, SPEC, ...
MIT