Skip to content

PatrickGallucci/fda-observability

Repository files navigation

FDA Observability

Production observability for a Microsoft Fabric Data Agent (FDA) that runs NL2DAX over a Power BI semantic model and is surfaced to end users through Microsoft 365 Copilot.

It captures, per interaction, as much of the troubleshooting/tuning triple as the platform exposes:

question → (rephrase / grounding) → generated DAX → executed DAX (+ perf) → answer → user → timestamp

and lands it in a Fabric Eventhouse (KQL), with a C# WinForms review/search/config app on top.

📖 Full documentation: https://patrickgallucci.github.io/fda-observability/


What is actually capturable for M365-originated calls (the honest matrix)

Part of the triple Best programmatic source Status Notes
User question (prompt) MS Graph aiInteractionHistory → text; Office 365 Mgmt API → index GA Graph gives full text; Mgmt API gives metadata + thread id
Rephrased question / grounding Workspace-monitoring ApplicationContext; FDA SDK run-steps (replay only) Partial Internal chain-of-thought is not exposed for live M365 calls
Generated / executed DAX Semantic-model workspace monitoring (EventText = DAX, executed over XMLA) GA The most reliable programmatic DAX source for M365 traffic
Answer (response) MS Graph aiInteractionHistory; Purview DSPM/Activity Explorer GA / Preview Graph returns response body
User + timestamp All three sources GA Used as the correlation key
Full DAX as the agent generated it + reasoning Purview DSPM for AI / Activity Explorer, FDA Python SDK (replay) Preview Activity Explorer shows generated queries; not cleanly API-exportable yet

Design consequence: no single API returns the whole triple for live M365 calls. We capture from multiple surfaces and correlate by user + time-window + workspace/model. See docs/architecture.md.


Components

fabric/kql/        Eventhouse schema: raw landing tables, curated FdaInteractions, update policies, analyst queries
fabric/notebooks/  FDA_Collector  — scheduled Fabric notebook (service principal); pulls all sources, queued ingest,
                                    Entra group expansion for the Graph user set
                   FDA_SDK_Replay — optional offline harness for full reasoning/step capture on sampled questions
fabric/dashboards/ FDA_Observability_RTI_Dashboard.json — Real-Time Dashboard (fleet health & tuning trends) + query pack
deploy/            Deploy-FdaObservability.ps1 — provisions the Eventhouse schema + schedules the collector notebook
src/               FdaObservability.App  — .NET 8 WinForms review + search + configuration app (interactive AAD)
                   FdaObservability.Core — cross-platform query client shared by all surfaces
                   FdaObservability.Api  — ASP.NET Core REST API + OpenAPI/Swagger (+ Dockerfile)
                   FdaObservability.Mcp  — MCP stdio server (AI agents/harnesses)
                   FdaObservability.Cli  — `fda-obs` cross-platform CLI (dotnet tool)
docs/              architecture.md, setup-tenant-and-identity.md

FDA_SDK_Replay lands its reasoning/step output in the Raw_SdkRuns table, so it sits alongside the production triple for side-by-side comparison (see the SDK-vs-production query in fabric/kql/03_queries.kql).

The WinForms app is for per-interaction drill-down; the Real-Time Dashboard is for fleet-level trends. Both read the same FdaInteractions table.

Deploy order

  1. Tenant + identity — enable settings, register the collector app, grant permissions, provision the Eventhouse. Follow docs/setup-tenant-and-identity.md.
  2. Eventhouse schema — run fabric/kql/01_tables.kql then 02_policies.kql in the KQL database query editor, or run deploy/Deploy-FdaObservability.ps1 -ClusterQueryUri <uri> -ProvisionSchema to apply both automatically (and -ScheduleNotebook -WorkspaceId <id> -NotebookId <id> to schedule the collector).
  3. Collector — import fabric/notebooks/FDA_Collector.py into a Fabric notebook, set parameters, run once, then schedule (e.g. every 15–60 min).
  4. WinForms app — open src/FdaObservability.sln, build, run, complete the in-app Configuration dialog (capacity / workspace / agent / Eventhouse URI), then review.
  5. Dashboard — import fabric/dashboards/FDA_Observability_RTI_Dashboard.json and set its data source to the Eventhouse. See fabric/dashboards/README.md.

Prerequisites (summary)

  • Fabric capacity F2+ (or P1+ with Fabric enabled); the FDA published over a Power BI semantic model.
  • Workspace monitoring enabled on the workspace hosting the semantic model (creates the monitoring Eventhouse).
  • An Entra app registration (the collector) with: AiEnterpriseInteraction.Read.All (Graph, app), GroupMember.Read.All (Graph, app — only if using Entra group expansion for the Graph user set), ActivityFeed.Read (Office 365 Management API, app), and Contributor/Member on the observability workspace so it can ingest into the Eventhouse (also grant Database Ingestor on the KQL database for queued ingestion).
  • M365 Copilot licensing for the users whose interactions you export via Graph.
  • .NET 8 SDK + Windows for the WinForms app.

Several capture paths are in preview (Purview DSPM for AI audit for FDA, FDA SDK). The collector degrades gracefully: any source that is unavailable is skipped and logged, not fatal.