CSRD-Lake is a portfolio reference implementation, not a production system. It does not implement bank-grade security controls.
- ❌ No authentication / RBAC on the dashboard or API surface
- ❌ No SOC 2 / ISO 27001 / HDS controls
- ❌ No bank-confidential data ingestion (CAC 40 / DAX 40 sustainability reports are public)
- ❌ No customer PII handling
- ❌ No audit logging beyond Airflow's built-in task logs and Snowflake's query history
If you find a security issue in any code in this repo (e.g., a path traversal in the downloader, a SQL-injection seam in the loader, an API-key leak), please do not open a public issue. Instead:
- Email:
pyaesonekyaw101010@gmail.com - Subject line:
[CSRD-Lake security]
I will respond within 7 days.
- Move secrets out of
.env— use Azure Key Vault / AWS Secrets Manager / GCP Secret Manager. - Replace
passwordSnowflake auth with key-pair or SSO (Azure AD / Okta). - Sandbox the LLM calls — Anthropic / Mistral may receive corporate disclosures with embedded prompt-injection attempts in scraped PDFs. Use Anthropic's prompt-cache + the
anthropic-versionheader pinning. - Validate downloaded PDFs deeper than magic bytes — add MIME sniffing + virus scan (ClamAV / Defender) before extraction.
- Add row-level security in Snowflake —
mart_disclosure_review_queuemay contain low-confidence values that should not surface to consumers without role-gating. - Pin all dependencies + run
pip-auditweekly in CI. - Enable Snowflake network policies to restrict access to known CIDR ranges.