Skip to content

donthula9908/microsoft-fabric-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Microsoft Fabric Analytics Platform

Microsoft Fabric Python PySpark SQL Power BI

📌 Overview

End-to-end Microsoft Fabric analytics platform built for a financial reporting team. Replaces a legacy on-prem SQL Server + SSRS stack with a modern Fabric Lakehouse, Dataflow Gen2, and Fabric Warehouse setup — cutting report delivery from 8 hours overnight to near real-time (< 20 minutes).


🏗️ Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        DATA SOURCES                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │SQL Server│  │Dynamics  │  │  SharePoint  │  │   REST APIs   │  │
│  │(On-prem) │  │365 F&O   │  │   Excel      │  │               │  │
│  └────┬─────┘  └────┬─────┘  └──────┬───────┘  └──────┬────────┘  │
└───────┼─────────────┼────────────────┼─────────────────┼───────────┘
        │             │                │                 │
        ▼             ▼                ▼                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     DATAFLOW GEN2 (ELT)                             │
│       Power Query M / Python transforms → Lakehouse Tables          │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    FABRIC LAKEHOUSE (OneLake)                        │
│                                                                     │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐   │
│   │ Bronze Files│    │Silver Tables│    │  Gold Delta Tables  │   │
│   │  (raw files)│    │(cleansed)   │    │  (business-ready)   │   │
│   └─────────────┘    └─────────────┘    └─────────────────────┘   │
└────────────────────────────┬────────────────────────────────────────┘
                             │
               ┌─────────────┴──────────────┐
               ▼                            ▼
┌──────────────────────────┐   ┌────────────────────────────┐
│   FABRIC WAREHOUSE       │   │  FABRIC NOTEBOOK (PySpark) │
│   (T-SQL Analytics)      │   │  ML / Advanced Analytics   │
└──────────────┬───────────┘   └────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────────────────────┐
│              POWER BI (DirectLake Mode)                             │
│     Executive Dashboards │ Self-service Reports │ Alerts            │
└─────────────────────────────────────────────────────────────────────┘

📁 Repository Structure

microsoft-fabric-analytics/
│
├── lakehouse-notebooks/
│   ├── 01_bronze_ingestion.ipynb       # Raw data ingestion to Files
│   ├── 02_silver_transformations.ipynb # Cleansing & normalisation
│   ├── 03_gold_business_entities.ipynb # Business aggregations
│   └── utils/
│       ├── lakehouse_utils.py          # Fabric lakehouse helper functions
│       └── schema_validation.py        # Schema enforcement utilities
│
├── dataflow-gen2/
│   ├── df_ingest_sql_server.json       # SQL Server → Lakehouse dataflow
│   ├── df_ingest_dynamics365.json      # D365 → Lakehouse dataflow
│   └── df_transform_financial.json     # Financial data transformations
│
└── fabric-warehouse-sql/
    ├── ddl/
    │   ├── create_dim_tables.sql        # Dimension table DDL
    │   ├── create_fact_tables.sql       # Fact table DDL
    │   └── create_views.sql            # Reporting views
    ├── stored_procedures/
    │   ├── usp_load_dim_customer.sql
    │   ├── usp_load_fact_financials.sql
    │   └── usp_refresh_reporting_layer.sql
    └── data_quality/
        └── dq_checks.sql               # Row count, null, range checks

⚡ Key Features

  • OneLake as single storage layer — no data duplication between Lakehouse and Warehouse
  • DirectLake Power BI mode — sub-second query performance without import/DirectQuery trade-offs
  • Dataflow Gen2 with 70+ connectors for no-code / low-code ingestion
  • Fabric Notebooks with PySpark for heavy transformation workloads
  • Fabric Warehouse for T-SQL-based BI teams with familiar SQL semantics
  • Fabric Pipelines for orchestration with built-in monitoring and alerting
  • Workspace-level git integration — full version control of all Fabric items

📊 Results

Metric Legacy Fabric
Report delivery time 8 hours (overnight) 18 minutes
Infrastructure cost $12K/month (on-prem) $3.2K/month
Data freshness T+1 Near real-time
Time to onboard new data source 2 weeks 2 days

🔧 Tech Stack

Component Technology
Storage OneLake (ADLS Gen2 compatible)
Compute Fabric Spark / Fabric Warehouse
Ingestion Dataflow Gen2, Fabric Pipelines
Processing PySpark, T-SQL
Table Format Delta Lake (V-Order optimised)
Visualisation Power BI (DirectLake)
Orchestration Fabric Pipelines, Fabric Job Scheduler
Version Control Fabric git integration (Azure DevOps)

About

Microsoft Fabric end-to-end analytics platform — Lakehouses, Dataflow Gen2, Fabric Warehouse, OneLake

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors