erp-sql-toolkit

Production-pattern T-SQL toolkit: idempotent imports, audit triggers, SCD Type 2, analytics KPIs, maintenance helpers, data-quality checks. Battle-tested patterns from enterprise ERP work, rebuilt clean-room on synthetic data.

What's in here

Module	What it gives you	Key file
Schema	Core dims & facts (users, programs, attendance, invoices, payments) on a clean star layout	`schema/001-core-schema.sql`
Import log	Audit table + start/finish procs used by every idempotent import	`schema/002-import-log.sql`
Audit framework	Generic row-level audit table + trigger generator (JSON before/after)	`schema/003-audit-framework.sql`
SCD2 dimensions	Slowly-Changing-Dimension Type 2 schema for student history	`schema/004-scd2-dimensions.sql`
Idempotent imports	`sp_import_attendance` — re-runnable with same `batch_id`, zero duplicates	`procedures/imports/sp_import_attendance.sql`
Daily close	Roll-up of raw punches → one row per user-day with status + duration	`procedures/imports/sp_close_attendance_day.sql`
SCD2 merge	Handles brand-new keys, unchanged rows (no-op), changes, late-arriving rows	`procedures/scd2/sp_merge_dim_student.sql`
Analytics KPIs	Absenteeism, fee collection rate, aging buckets, per-program drill	`procedures/analytics/`
Maintenance	Index health, smart reindex, missing-index DMV report, unused-index report	`procedures/maintenance/`
Data quality	Generic orphan finder, generic duplicate finder	`procedures/dq/`
Functions	`fn_business_days_between`, `fn_age_in_years`, `fn_split_csv`, `fn_date_range`	`functions/`
Views	`vw_attendance_daily`, `vw_finance_summary` — Power BI / Excel friendly	`views/`
tSQLt tests	Unit tests for imports, SCD2, DQ	`tests/`
Seed data	~1000 rows synthetic data — students, punches, invoices	`samples/seed-small.sql`
Docs	T-SQL conventions, idempotency pattern, SCD2 pattern	`docs/`

The featured pattern — Idempotent imports

The single most valuable script in this toolkit. Most enterprise import SPs in the wild have a hidden bug: re-running them inserts duplicates.

-- Re-runnable.  Same @batch_id = zero new rows.
EXEC core.sp_import_attendance
     @batch_id    = N'BATCH-2026-05-23',
     @source_file = N'punches.csv';

The procedure:

Writes a RUNNING row to audit.import_log
Idempotency guard: if (batch_id, operation, status=COMPLETED) already exists → returns SKIPPED_IDEMPOTENT
Joins staging → users → computes SHA2_256 row hash
INSERT ... WHERE NOT EXISTS against the unique index on (user_id, punch_at, device_id, direction)
Marks audit row COMPLETED with @@ROWCOUNT, or FAILED with error + THROW

Full story → docs/idempotency-pattern.md

Quick start

Prerequisites

SQL Server 2019+ (LocalDB / Express / Developer all work)
sqlcmd on PATH (ships with SQL Server)

Run the whole thing (one command, ~60 seconds)

# From repo root:
cd tools
sqlcmd -S "(localdb)\MSSQLLocalDB" -i run-everything.sql -v ServerType=LocalDB

What it does:

Drops/creates ERPToolkitDemo database
Applies all schema scripts in order
Creates every proc, function, and view
Seeds ~1000 rows of synthetic data
Closes a sample day & prints KPIs

Apply piecemeal (for review)

sqlcmd -S "(localdb)\MSSQLLocalDB" -i schema/000-setup-database.sql
sqlcmd -S "(localdb)\MSSQLLocalDB" -i schema/001-core-schema.sql
# ...etc — see tools/run-everything.sql for the canonical order

Try the idempotency demo

USE ERPToolkitDemo;

-- First run inserts rows
EXEC core.sp_import_attendance @batch_id = N'DEMO-1';
SELECT COUNT(*) AS row_count_after_first_run FROM core.fact_punch WHERE batch_id = N'DEMO-1';

-- Second run is a no-op — zero new rows
EXEC core.sp_import_attendance @batch_id = N'DEMO-1';
SELECT COUNT(*) AS row_count_after_second_run FROM core.fact_punch WHERE batch_id = N'DEMO-1';

-- Audit trail
SELECT TOP 5 batch_id, status, rows_affected, started_at, finished_at
FROM   audit.import_log
WHERE  operation = N'IMPORT_ATTENDANCE'
ORDER  BY log_id DESC;

Try the KPI procedures

USE ERPToolkitDemo;

EXEC mart.sp_close_attendance_day @for_date = '2026-04-15';

EXEC mart.sp_kpi_absenteeism
     @from_date = '2026-04-01',
     @to_date   = '2026-04-30';

EXEC mart.sp_kpi_collection_rate
     @from_date = '2026-04-01',
     @to_date   = '2026-04-30';

Run the tests

-- 1. Install tSQLt manually — see tools/install-tsqlt.sql
-- 2. Then:
USE ERPToolkitDemo;
EXEC tSQLt.Run 'imports';
EXEC tSQLt.Run 'scd2_tests';
EXEC tSQLt.Run 'dq_tests';

-- Or run everything:
EXEC tSQLt.RunAll;

Folder structure

erp-sql-toolkit/
├── schema/                          # versioned DDL, run in order
│   ├── 000-setup-database.sql       # drop/create ERPToolkitDemo + schemas
│   ├── 001-core-schema.sql          # dims & facts
│   ├── 002-import-log.sql           # audit.import_log + start/finish procs
│   ├── 003-audit-framework.sql      # audit.row_audit + trigger generator
│   └── 004-scd2-dimensions.sql      # SCD Type 2 dimension table
├── procedures/
│   ├── imports/
│   │   ├── sp_import_attendance.sql # the flagship idempotent import
│   │   └── sp_close_attendance_day.sql
│   ├── scd2/
│   │   └── sp_merge_dim_student.sql # SCD2 merge with late-arrival handling
│   ├── analytics/
│   │   ├── sp_kpi_absenteeism.sql
│   │   └── sp_kpi_collection_rate.sql
│   ├── maintenance/
│   │   ├── sp_index_health.sql
│   │   ├── sp_reindex_smart.sql     # online rebuild >30%, reorg 5..30%
│   │   ├── sp_missing_indexes.sql   # DMV-based suggestions
│   │   └── sp_unused_indexes.sql    # candidates to drop
│   └── dq/
│       ├── sp_find_orphans.sql      # generic, works on any (child,parent) pair
│       └── sp_find_duplicates.sql   # generic, works on any column set
├── functions/
│   ├── fn_business_days_between.sql
│   ├── fn_age_in_years.sql
│   ├── fn_split_csv.sql             # inline TVF
│   └── fn_date_range.sql            # inline TVF
├── views/
│   ├── vw_attendance_daily.sql      # Power BI / Excel friendly
│   └── vw_finance_summary.sql
├── tests/                           # tSQLt
│   ├── test_imports.sql
│   ├── test_scd2.sql
│   └── test_dq.sql
├── samples/
│   └── seed-small.sql               # ~1000 rows synthetic
├── tools/
│   ├── install-tsqlt.sql            # instructions only — fetch tSQLt yourself
│   └── run-everything.sql           # single-shot setup
├── docs/
│   ├── conventions.md               # naming + safety conventions used here
│   ├── idempotency-pattern.md       # why & how
│   └── scd2-pattern.md              # SCD2 with late-arrival handling
├── BUILD-PLAN.md                    # the spec this was built from
├── LICENSE
└── README.md

Conventions I follow (and why)

Rule	Why
`snake_case` for all identifiers	ANSI/Postgres compatible; no quoting
`core` / `stg` / `audit` / `mart` / `util` / `dq` schemas	Purpose-of-data, not technology
Every SP starts with `SET NOCOUNT ON; SET XACT_ABORT ON;`	Fewer roundtrips + safe transactions
Every write wrapped in `TRY ... CATCH ... ROLLBACK ... THROW`	Atomic, observable failures
`DATETIME2(0)` everywhere, never `DATETIME`	Smaller, more accurate, ANSI
No `SELECT *` in production code	Schema-drift safety
`INSERT ... WHERE NOT EXISTS` over naive `MERGE`	Avoids the canonical MERGE race

Full list → docs/conventions.md

What I learned building this

Idempotency must be designed in, not patched on. The audit row + uniqueness key + WHERE NOT EXISTS pattern is short, but every part of it earns its place. Any one of them missing and re-runs misbehave.
The MERGE race condition is the most common production bug in T-SQL data engineering. Most teams write MERGE without HOLDLOCK because the syntax looks complete. It isn't.
Row hashes (HASHBYTES('SHA2_256', concat_ws(...))) are the difference between an SCD2 merge that's fast and one that's pathological. Comparing business attributes via hash means no-op merges are a single comparison, not a full row diff.
Audit logs that are also the idempotency mechanism are double-duty wins. One write does both jobs.

Roadmap

CDC-aware version of sp_import_attendance (consume Change Data Capture instead of staging)
Always-on / availability-group-aware variants of sp_reindex_smart
Column-store fact partitioning examples
Parameterized data generator (CLI param: n_users, n_days)
PostgreSQL port for the patterns (FOR JSON → JSON_AGG, etc.)

Disclaimer

This is a clean-room reference implementation. No code, schema, screenshot, or data from any employer or client appears in this repository. All data is synthetic, generated by the included seeder.

License

MIT — see LICENSE.

Author: Yugandhar N — Junior Software Developer, Fresh B.Tech Grad (Apr 2026)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

erp-sql-toolkit

What's in here

The featured pattern — Idempotent imports

Quick start

Prerequisites

Run the whole thing (one command, ~60 seconds)

Apply piecemeal (for review)

Try the idempotency demo

Try the KPI procedures

Run the tests

Folder structure

Conventions I follow (and why)

What I learned building this

Roadmap

Disclaimer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
docs		docs
functions		functions
procedures		procedures
samples		samples
schema		schema
tests		tests
tools		tools
views		views
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
BUILD-PLAN.md		BUILD-PLAN.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

erp-sql-toolkit

What's in here

The featured pattern — Idempotent imports

Quick start

Prerequisites

Run the whole thing (one command, ~60 seconds)

Apply piecemeal (for review)

Try the idempotency demo

Try the KPI procedures

Run the tests

Folder structure

Conventions I follow (and why)

What I learned building this

Roadmap

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages