You close your laptop. 7 AI agents wake up. By the time your morning coffee is ready — the reports are already there.
What's New in v2.0 · Quick Start · Architecture · Agents · Commands · Outputs
D365 Observability Hub is a Claude Code native overnight monitoring system for Dynamics 365 Finance & Operations. It uses AI agent architecture to automatically query your Azure App Insights, detect performance issues, and write reports — while you sleep.
No server. No Node.js app. No API keys. No hardcoding. Just markdown files and Claude Code.
| Domain | Source Table | What It Detects |
|---|---|---|
| Batch Jobs | customEvents | Slow jobs, failures, thread saturation, CPU/DTU throttling |
| DMF | customEvents | Failed exports, staging errors, slow jobs, aborted runs |
| Exceptions | exceptions | X++ errors, new exception types, batch correlations |
| Forms | pageViews | Slow form loads, P95/P99, active sessions, regional issues |
All table names, event names, column names and field mappings are discovered dynamically from your App Insights — nothing is hardcoded.
batch-agent.md had:
name == "BatchTaskFinished" ← hardcoded event name
tolong(customDimensions.elapsedMilliseconds) ← hardcoded field
> 60000 ← hardcoded threshold
ago(1h) ← hardcoded time window
App ID: 9d57480a-... ← hardcoded in every agent
batch-agent.md now reads:
name in (schema.domains.batch.events) ← discovered at runtime
tolong(customDimensions.{schema.domains.batch.durationField}) ← from schema
> schema.thresholds.batch.slow_job_warning_ms ← from thresholds.json
ago({schema.lookbackWindow}m) ← from thresholds.json
App ID from schemas/active.json only ← never in agent files
| Feature | v1.0 | v2.0 |
|---|---|---|
| Event names | Hardcoded in agent files | Discovered live from App Insights |
| Column names | Hardcoded in agent files | Discovered from customDimensions |
| Thresholds | Hardcoded in CLAUDE.md | schemas/thresholds.json — auto-generated |
| App ID | Hardcoded in every agent | schemas/active.json only |
| Status values | Hardcoded ("Finished", "Error") | Discovered dynamically |
| Portability | D365 FO only | Any Azure App Insights resource |
| Dashboard | Manual only | Auto-generated after every cycle |
| schema-analyst | Read pre-defined schema | Queries App Insights live at startup |
schemas/active.json ← YOU fill this in (App ID + tenant only)
│
▼
/load-schema → schema-analyst runs
│
├── Queries App Insights live → discovers tables
├── Queries each table → discovers event names
├── Queries each event → discovers customDimensions fields
├── Reads or auto-generates → schemas/thresholds.json
└── Saves everything to → schemas/parsed-schema.json
│
▼
parsed-schema.json ← contains BOTH:
· thresholds (from thresholds.json)
· discovered schema (from live queries)
│
┌──────────┴──────────┐
▼ ▼
All 7 agents query-runner
read from here reads App ID
— no hardcoding from here only
/monitor starts
│
▼
Reads schemas/parsed-schema.json
│
▼
4 specialist agents spawn in parallel
├── batch-agent → queries customEvents using discovered batch events
├── dmf-agent → queries customEvents using discovered DMF events
├── exception-agent → queries exceptions table
└── form-agent → queries pageViews table
│
▼
Reports written → reports/YYYY-MM-DD/{agent}-report-{timestamp}.md
│
▼
Alerts written → alerts/alert-{agent}-{timestamp}.json
(only when threshold breached)
│
▼
Dashboard auto-generated → reports/dashboard.html ← NEW in v2.0
│
▼
Sleep 60 minutes
│
▼
Repeat all night ↻
- Node.js v18+ — nodejs.org
- Azure CLI — learn.microsoft.com/cli/azure
- Anthropic API key — console.anthropic.com
- Azure App Insights resource with D365 FO telemetry enabled
Download and install from nodejs.org — version 18 or higher required.
node --version # confirm v18+npm install -g @anthropic-ai/claude-code
claude --version # confirm installedGet your key from console.anthropic.com → API Keys → Create Key
# Windows (permanent — survives restarts)
setx ANTHROPIC_API_KEY "sk-ant-your-key-here"
# Mac/Linux
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
# Add to ~/.bashrc or ~/.zshrc to make permanentWindows: Close and reopen your terminal after
setxfor the key to take effect.
az login --tenant YOUR_TENANT.onmicrosoft.com
# Confirm you are logged in
az account show --query user.name --output tsvThe system uses Azure AD authentication via
az rest— no App Insights API keys needed.
git clone https://github.com/prashantdce21MSFT/d365-observability-hub.git
cd d365-observability-hubOr download the ZIP from Releases and extract it.
Open schemas/active.json — this is the only file you need to edit:
{
"appId": "YOUR_APP_INSIGHTS_APP_ID",
"appName": "YOUR_APP_INSIGHTS_NAME",
"resourceGroup": "YOUR_RESOURCE_GROUP",
"tenant": "YOUR_TENANT.onmicrosoft.com",
"azRest": "az rest --method post --url \"https://api.applicationinsights.io/v1/apps/YOUR_APP_INSIGHTS_APP_ID/query\" --headers \"Content-Type=application/json\" --body \"{\\\"query\\\": \\\"{query}\\\"}\""
}Find your App ID:
az monitor app-insights component show \
--app YOUR_APP_INSIGHTS_NAME \
--resource-group YOUR_RESOURCE_GROUP \
--query appId \
--output tsvOr in Azure Portal: App Insights → YOUR_RESOURCE → Properties → Application ID
claude> hello
> /load-schema
Schema-analyst will:
- Read your App ID from
schemas/active.json - Query App Insights live to discover all tables
- Discover all event names in each table
- Discover all
customDimensionsfields per event - Auto-generate
schemas/thresholds.jsonwith D365 defaults - Save everything to
schemas/parsed-schema.json
When prompted press 2 — Yes, allow all edits this session.
Schema loaded — 3 tables, N events discovered. Ready for /monitor or /query.
- customEvents — N events (batch + DMF)
- exceptions — N types
- pageViews — N form events
Edit
schemas/thresholds.jsonto tune warning/critical levels for your environment.
> /query "show me the last DMF export"
> /monitor
All 7 agents spawn in parallel. Every 60 minutes: reports written, alerts filed, dashboard generated.
# Windows
start /B claude --dangerously-skip-permissions --print "/monitor 60" > monitor.log 2>&1
# Mac/Linux
nohup claude --dangerously-skip-permissions --print "/monitor 60" > monitor.log 2>&1 &
# Check on it
type monitor.log # Windows
tail -f monitor.log # Mac/Linux
# Stop in the morning
taskkill /IM claude.exe /F # Windows
kill $(pgrep claude) # Mac/LinuxAuto-generated on first /load-schema. Edit to tune for your environment — no agent files need changing.
{
"batch": {
"slow_job_warning_ms": 60000,
"slow_job_critical_ms": 300000,
"thread_utilisation_warning_pct": 75,
"thread_utilisation_critical_pct": 95,
"queue_depth_warning": 10,
"queue_depth_critical": 50,
"throttle_critical_per_hour": 3
},
"dmf": {
"job_duration_warning_ms": 300000,
"job_duration_critical_ms": 1800000,
"staging_error_same_entity_critical": 3
},
"exceptions": {
"rate_warning_per_minute": 10,
"rate_critical_per_minute": 50
},
"forms": {
"p95_warning_ms": 3000,
"p95_critical_ms": 10000
},
"general": {
"lookback_window_minutes": 60,
"max_rows_per_query": 100
}
}Common tuning examples:
- Batch jobs normally take 3 min → set
slow_job_warning_msto180000 - DMF exports legitimately take 15 min → set
job_duration_warning_msto900000 - Run every 30 min → set
lookback_window_minutesto30
Auto-generated by schema-analyst — do not edit manually. Re-run /load-schema to refresh.
Contains everything agents need:
- Connection details (from
active.json) - All thresholds (from
thresholds.json) - Discovered tables, event names, column names
- Field mappings per event (duration field, status field, entity field)
- Warnings about your specific environment
Every agent reads exclusively from this file. No connection details or field names exist anywhere else.
| Agent | Type | Purpose |
|---|---|---|
schema-analyst |
sub-task | Runs once. Discovers tables, events, columns live from App Insights. Auto-generates thresholds.json |
batch-agent |
specialist | Batch jobs — uses discovered event names and duration field |
dmf-agent |
specialist | DMF exports/imports — uses discovered status field and correlation field |
exception-agent |
specialist | X++ exceptions — uses discovered column names |
form-agent |
specialist | Form load times — uses discovered duration and name fields |
kql-generator |
sub-agent | Writes KQL from plain English using schema only — never hardcodes |
query-runner |
sub-agent | Executes KQL via az rest — reads App ID from parsed-schema only |
insights-writer |
sub-agent | Applies thresholds from parsed-schema — writes reports, alerts, dashboard |
| Command | Description |
|---|---|
/load-schema |
Run this first. Discovers schema, auto-generates thresholds.json |
/monitor |
Start overnight loop. All 7 agents every 60 min + auto dashboard |
/monitor 30 |
Run every 30 minutes |
/query "..." |
One-off question in plain English |
/status |
Show recent reports, alert count, last run |
/query "show me the last DMF export"
/query "which batch jobs took more than 1 minute today"
/query "any batch job failures in the last 4 hours"
/query "how many DMF jobs ran in the last 7 days"
/query "what are the slowest forms right now"
/query "any new exception types today"
/query "show batch thread utilisation trend"
| Path | Contents | Written when |
|---|---|---|
reports/YYYY-MM-DD/ |
Markdown report per agent per cycle | Every cycle, always |
alerts/ |
JSON alert per warning/critical finding | Only when threshold breached |
reports/dashboard.html |
Dark themed HTML dashboard | Auto after every cycle |
kql-cache/ |
Every generated KQL + results | Every query |
run-log.jsonl |
Append-only structured event log | Every agent action |
| Report | Alert | |
|---|---|---|
| Written | Every cycle — always | Only when threshold crossed |
| Format | Full markdown — summary, metrics, KQL, recommendations | Short JSON — severity, title, detail, recommendation |
| Purpose | Morning reading — full story | Immediate action — urgent findings |
| Analogy | Shift handover document | Pager notification |
d365-observability-hub/
├── CLAUDE.md ← Orchestrator brain (always loaded)
├── .claude/
│ ├── settings.json ← Tool permissions
│ ├── agents/
│ │ ├── schema-analyst.md ← Discovers schema + auto-generates thresholds
│ │ ├── batch-agent.md ← Batch monitoring (schema-driven)
│ │ ├── dmf-agent.md ← DMF monitoring (schema-driven)
│ │ ├── exception-agent.md ← Exception monitoring (schema-driven)
│ │ ├── form-agent.md ← Form monitoring (schema-driven)
│ │ ├── kql-generator.md ← KQL from plain English (schema-driven)
│ │ ├── query-runner.md ← Executes KQL (App ID from schema only)
│ │ └── insights-writer.md ← Reports + alerts + dashboard
│ └── commands/
│ ├── monitor.md
│ ├── query.md
│ ├── status.md
│ └── load-schema.md
├── schemas/
│ ├── active.json ← YOU edit this (App ID + tenant only)
│ ├── thresholds.json ← Auto-generated. Edit to tune.
│ └── parsed-schema.json ← Auto-generated. Do not edit.
├── reports/ ← Runtime (gitignored)
├── alerts/ ← Runtime (gitignored)
├── kql-cache/ ← Runtime (gitignored)
├── docs/
│ └── D365-Observability-Hub.gif
├── .env.example
├── SETUP.md
├── CONTRIBUTING.md
└── README.md
| Problem | Fix |
|---|---|
/load-schema fails 403 |
Your Azure account needs Reader access on App Insights resource |
/monitor not recognised |
You are outside Claude Code. Run claude first |
| Agents asking permission repeatedly | Press 2 (allow all edits this session) or use --dangerously-skip-permissions |
| Simulate mode instead of live data | Run az login --tenant YOUR_TENANT before starting claude |
| Reports folder empty | Wait ~5 min for first cycle to complete after /monitor |
| thresholds.json not generated | Check schemas/active.json has valid App ID then re-run /load-schema |
| parsed-schema.json stale | Re-run /load-schema — recommended every 7 days |
| Banner not showing | Type hello at the > prompt |
Works on any Azure App Insights resource — not just D365 FO.
To point at a different environment:
- Update
schemas/active.jsonwith the new App ID - Delete
schemas/thresholds.jsonandschemas/parsed-schema.json - Run
/load-schema— everything rediscovered automatically
No agent files need editing.
- Claude Code — Anthropic's CLI agent framework
- Azure Application Insights — D365 telemetry
- Azure CLI —
az restfor Azure AD auth - KQL — Kusto Query Language
Prashant Verma — Principal Consultant, AI Business Solutions
MIT — see LICENSE for details.
