Framework Mapping

This document maps the conceptual contributions of the published paper to the modules in this implementation.

Reference: Mudusu, S. K., & Gentyala, S. (2026). Zero-Trust Data Pipelines for AI Systems: A Framework for Secure, Verifiable, and Auditable Data Engineering. Journal of Recent Trends in Computer Science and Engineering, 14(2), 10–25.

Paper section → implementation module

Paper concept	Module / component	Notes
Zero-trust ingestion boundary	`ingestion.py`	Checksum, extension guard, size limit
Data integrity verification	`ingestion._sha256()`	SHA-256 on raw bytes before parsing
Schema validation layer	`validation.py`	Required fields, null counts, duplicate check
Policy-driven access control	`policy_engine.py` + `policies.yaml`	Declarative YAML rules, per-rule decisions
PII detection and flagging	`policy_engine` — `pii_columns` rule	Flags presence, does not mask (extension point)
Data lineage capture	`lineage.py` — `LineageTracker`	SQLite, queryable history
Immutable audit trail	`audit.py` — `AuditLogger`	Append-only SQLite, JSONL export
AI-readiness / trust scoring	`trust_score.py`	Weighted 0–100 score, letter grade
Verifiable pipeline composition	`examples/sample_pipeline.py`	End-to-end stage orchestration

Design decisions

Why YAML for policies? The paper argues that policy definitions should be separate from pipeline code and auditable as configuration artifacts. YAML satisfies both: it is human-readable, version-controllable, and parsed at runtime so policies can change without code changes.

Why SQLite for lineage and audit? The implementation targets local and single-node deployments. SQLite gives us ACID semantics and queryability without requiring a database server. The LineageTracker and AuditLogger interfaces are thin enough that the storage backend can be swapped (e.g., to PostgreSQL or DuckDB) by changing the connection string.

Why separate lineage and audit stores? Lineage describes what happened to data; audit describes who did what and whether it succeeded. Mixing them conflates two distinct concerns. Keeping them separate simplifies querying and access control in multi-actor deployments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Framework Mapping

Paper section → implementation module

Design decisions

FilesExpand file tree

framework_mapping.md

Latest commit

History

framework_mapping.md

File metadata and controls

Framework Mapping

Paper section → implementation module

Design decisions