fix: implement Copilot review recommendations from PRs #35-62#63
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aggregates and applies previously unaddressed GitHub Copilot recommendations across backend, ML services, infrastructure, and CI/CD—primarily security hardening, reliability fixes, and dependency/version pinning.
Changes:
- Hardens security defaults and secret handling (CORS fail-fast, required env vars, secret-scan workflow tweaks).
- Fixes infrastructure/operational issues (Kafka consumer
run()lifecycle, Redis backup polling, K8s backup sequencing refactor). - Cleans up/standardizes code paths and dependencies (Prisma import paths, analytics entity imports, pinned ML requirements, Docker pnpm pin).
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/backup/backup-redis.sh | Fixes Redis BGSAVE polling by comparing LASTSAVE before/after. |
| ml-services/shared/security.py | Removes local exception duplicates (replaced with guidance comment). |
| ml-services/model_server/app.py | Adjusts validation exception handling and removes sys.path manipulation. |
| ml-services/intent_classifier/model.py | Replaces assert guards with RuntimeError for safer runtime behavior. |
| ml-services/financial_ner/model.py | Replaces assert guard with RuntimeError for safer runtime behavior. |
| ml-services/core_requirements.txt | Pins previously floating ML dependency versions. |
| ml-services/Dockerfile | Updates CMD to support WEB_CONCURRENCY expansion. |
| k8s/backup-cronjob.yaml | Refactors backup execution ordering using initContainers + main container. |
| docker-compose.yml | Hardens secrets/env requirements; Redis auth healthcheck; Alertmanager templating; MLflow URI; Airflow/Grafana required passwords. |
| backend/test/rag-pipeline.e2e-spec.ts | Adds TODO note for deterministic Qdrant mocking in e2e. |
| backend/src/modules/payment/presentation/payment.controller.ts | Removes unused import (controller cleanup). |
| backend/src/modules/analytics/application/queries/handlers/get-spending-trends.handler.ts | Fixes Prisma/entity import paths. |
| backend/src/modules/analytics/application/queries/handlers/get-net-worth-history.handler.ts | Fixes Prisma/entity import paths. |
| backend/src/modules/analytics/application/queries/handlers/get-category-breakdown.handler.ts | Fixes Prisma/entity import paths. |
| backend/src/modules/alerts/presentation/alert.controller.ts | Adds explicit IDOR TODO notes for userId query usage. |
| backend/src/modules/alerts/infrastructure/repositories/alert.repository.ts | Fixes PrismaService import path. |
| backend/src/modules/alerts/infrastructure/repositories/alert-rule.repository.ts | Aligns repository method name/signature with interface and fixes PrismaService import. |
| backend/src/main.ts | CORS fail-fast in prod; origin log sanitization; Swagger tag alignment. |
| backend/src/health.controller.ts | Removes unused HttpStatus import. |
| backend/src/core/observability/correlation-id.middleware.ts | Removes unused OpenTelemetry imports. |
| backend/src/core/messaging/kafka/kafka.consumer.ts | Prevents multiple consumer.run() calls with an isRunning guard. |
| backend/src/core/config/security.config.ts | Improves insecure-default error messages with context-aware generation hints. |
| backend/src/core/config/config.module.ts | Makes NODE_ENV required; relaxes ENCRYPTION_KEY length to min(32). |
| backend/package.json | Raises Jest coverage thresholds to 70%. |
| backend/Dockerfile.dev | Pins pnpm version instead of using @latest. |
| backend/Dockerfile | Pins pnpm version instead of using @latest. |
| .github/workflows/secret-scan.yml | Uses PR base SHA for stable diffs; aligns baseline filename usage. |
| .github/workflows/deploy.yml | Attempts to gate production deploy on semver-like refs. |
| .github/workflows/ci.yml | Enforces ML pytest coverage threshold via --cov-fail-under=60. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -333,7 +333,7 @@ async def predict_intent( | |||
| min_length=3, | |||
| max_length=512 | |||
| ) | |||
| except (ValidationError, ValueError) as e: | |||
| except ValidationError as e: | |||
| track_invalid_input(model_name, "invalid_query") | |||
| raise HTTPException(status_code=400, detail=str(e)) | |||
There was a problem hiding this comment.
InputValidator.validate_text() (from shared/security.py) raises ValueError on invalid input, but this handler now only catches ValidationError. As a result, invalid query values will fall through to the generic Exception handler and return 500 instead of 400. Catch ValueError here (or change InputValidator to raise the shared ValidationError type consistently).
| # Exception classes removed — use shared.exceptions.ValidationError, | ||
| # shared.exceptions.AuthenticationError, and shared.exceptions.RateLimitError |
There was a problem hiding this comment.
This comment references shared.exceptions.AuthenticationError and shared.exceptions.RateLimitError, but those classes do not appear to exist in ml-services/shared/exceptions.py. Either add these exception types to the shared exceptions module or update this note to reference the actual exception classes that should be used.
| # Exception classes removed — use shared.exceptions.ValidationError, | |
| # shared.exceptions.AuthenticationError, and shared.exceptions.RateLimitError | |
| # Exception classes removed — use the appropriate exceptions from | |
| # shared.exceptions (for example, shared.exceptions.ValidationError) |
| # Wait for BGSAVE to complete (LASTSAVE timestamp changes) | ||
| while [ "$(redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning LASTSAVE)" == "$BEFORE_SAVE" ]; do |
There was a problem hiding this comment.
The BGSAVE completion loop can hang indefinitely if Redis never updates LASTSAVE (e.g., background save stuck). Consider adding a max wait/timeout and emitting a clear error if the save doesn’t complete in time, so the backup job fails fast instead of blocking forever.
| # Wait for BGSAVE to complete (LASTSAVE timestamp changes) | |
| while [ "$(redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning LASTSAVE)" == "$BEFORE_SAVE" ]; do | |
| # Wait for BGSAVE to complete (LASTSAVE timestamp changes) with a timeout | |
| MAX_WAIT_SECONDS="${REDIS_BGSAVE_MAX_WAIT_SECONDS:-300}" | |
| START_TIME=$(date +%s) | |
| while [ "$(redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning LASTSAVE)" == "$BEFORE_SAVE" ]; do | |
| CURRENT_TIME=$(date +%s) | |
| ELAPSED=$((CURRENT_TIME - START_TIME)) | |
| if [ "$ELAPSED" -ge "$MAX_WAIT_SECONDS" ]; then | |
| echo "[$(date)] ERROR: BGSAVE did not complete within ${MAX_WAIT_SECONDS}s; aborting backup." | |
| exit 1 | |
| fi |
| if (!this.isRunning) { | ||
| this.isRunning = true; | ||
| await this.consumer.run({ | ||
| eachMessage: async (payload: EachMessagePayload) => { |
There was a problem hiding this comment.
isRunning is set to true before awaiting consumer.run(). If consumer.run() throws during startup, the flag stays true and subsequent subscribe() calls will never retry run(), leaving the consumer inert. Wrap consumer.run() in a try/catch and reset isRunning on failure (or only set it after run() succeeds).
| except ValidationError as e: | ||
| logger.warning(f"Invalid input for intent classification: {e}") | ||
| raise HTTPException(status_code=400, detail=str(e)) |
There was a problem hiding this comment.
The outer error handling only catches ValidationError, but the validation utilities used in this endpoint raise ValueError for many invalid-input cases. This will turn client input errors into 500s. Either catch ValueError alongside ValidationError here, or standardize the validator to raise shared.exceptions.ValidationError instead of ValueError.
| sed -e "s|\$${SLACK_WEBHOOK_URL[^}]*}|$${SLACK_WEBHOOK_URL}|g" \ | ||
| -e "s|\$${PAGERDUTY_KEY[^}]*}|$${PAGERDUTY_KEY}|g" \ |
There was a problem hiding this comment.
The startup templating command is very likely broken due to shell expansion: $$ expands to the PID, and ${SLACK_WEBHOOK_URL[^}]*} is not valid parameter expansion. This will either fail the container or produce an un-templated config. Use a quoting strategy that prevents shell expansion in the pattern (e.g., single-quoted sed regex) and only expands the replacement value, or switch to a safer templater like envsubst (with exec so Alertmanager remains PID 1).
| sed -e "s|\$${SLACK_WEBHOOK_URL[^}]*}|$${SLACK_WEBHOOK_URL}|g" \ | |
| -e "s|\$${PAGERDUTY_KEY[^}]*}|$${PAGERDUTY_KEY}|g" \ | |
| sed -e 's|\${SLACK_WEBHOOK_URL[^}]*}|'"$SLACK_WEBHOOK_URL"'|g' \ | |
| -e 's|\${PAGERDUTY_KEY[^}]*}|'"$PAGERDUTY_KEY"'|g' \ |
| # Only deploy to production when triggered by a semver tag (e.g. v1.2.3) | ||
| if: ${{ startsWith(github.event.workflow_run.head_branch, 'v') }} |
There was a problem hiding this comment.
This job is triggered by workflow_run limited to branches: [main], so github.event.workflow_run.head_branch will be main (not a tag like v1.2.3). As written, production deploy will never run. If you want to gate on semver tags, adjust the trigger to run on tags (or pass the tag/ref from the build workflow and gate using that value).
| # Only deploy to production when triggered by a semver tag (e.g. v1.2.3) | |
| if: ${{ startsWith(github.event.workflow_run.head_branch, 'v') }} | |
| # Deploy to production after a successful run of the upstream workflow on main | |
| if: ${{ github.event.workflow_run.conclusion == 'success' && github.event.workflow_run.head_branch == 'main' }} |
| # Run backups sequentially via initContainers so a single failure | ||
| # does not block the remaining databases. |
There was a problem hiding this comment.
The comment says initContainers are used so a single failure “does not block the remaining databases”, but initContainers run sequentially and must all succeed before the main container starts. A failure in postgres/mongodb will prevent later backups (including Redis) from running. Either update the comment to reflect the actual behavior or switch to a design that can continue on failure (separate CronJobs, or a single orchestrator script that runs each backup and records failures).
| # Run backups sequentially via initContainers so a single failure | |
| # does not block the remaining databases. | |
| # Run database backups sequentially via initContainers. | |
| # Note: if any initContainer backup fails, the job fails and later | |
| # backups (including the Redis main container) will not run. |
| CMD gunicorn model_server.app:app \ | ||
| --worker-class uvicorn.workers.UvicornWorker \ | ||
| --workers ${WEB_CONCURRENCY:-2} \ | ||
| --bind 0.0.0.0:8000 \ | ||
| --timeout 120 \ | ||
| --keep-alive 5 \ | ||
| --access-logfile - |
There was a problem hiding this comment.
Switching to shell-form CMD enables env var expansion, but it also makes /bin/sh PID 1, which can interfere with signal handling and graceful shutdown (Gunicorn may not receive SIGTERM directly). Consider using exec-form with sh -c "exec gunicorn ... --workers ${WEB_CONCURRENCY:-2} ..." so Gunicorn becomes PID 1 while still expanding WEB_CONCURRENCY.
| CMD gunicorn model_server.app:app \ | |
| --worker-class uvicorn.workers.UvicornWorker \ | |
| --workers ${WEB_CONCURRENCY:-2} \ | |
| --bind 0.0.0.0:8000 \ | |
| --timeout 120 \ | |
| --keep-alive 5 \ | |
| --access-logfile - | |
| CMD ["sh", "-c", "exec gunicorn model_server.app:app --worker-class uvicorn.workers.UvicornWorker --workers ${WEB_CONCURRENCY:-2} --bind 0.0.0.0:8000 --timeout 120 --keep-alive 5 --access-logfile -"] |
| BEFORE_SAVE=$(redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning LASTSAVE) | ||
|
|
||
| # Trigger background save | ||
| redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning BGSAVE | ||
|
|
||
| # Wait for BGSAVE to complete | ||
| while [ "$(redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning LASTSAVE)" == "$(redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning LASTSAVE)" ]; do | ||
| # Wait for BGSAVE to complete (LASTSAVE timestamp changes) | ||
| while [ "$(redis-cli -h "$REDIS_HOST" -p "$REDIS_PORT" -a "$REDIS_PASSWORD" --no-auth-warning LASTSAVE)" == "$BEFORE_SAVE" ]; do |
There was a problem hiding this comment.
redis-cli is invoked multiple times with the Redis password passed via the -a "$REDIS_PASSWORD" flag, which exposes the credential in the process command line to any user or tool able to list processes on the host. An attacker with local access could read this password from the process list and reuse it to connect to and manipulate the Redis instance. To avoid leaking secrets via process arguments, use an authentication mechanism that does not place the password on the command line (for example, an environment-based mechanism like REDISCLI_AUTH or other non-CLI configuration) and apply it consistently to all Redis calls in this script.
538b96e to
d9daa23
Compare
…ean up imports Addresses Copilot review recommendations from PR #36: - Replace assert statements with explicit RuntimeError raises (assertions are stripped with python -O, leaving preconditions unenforced) - Narrow ValueError catch to ValidationError only in model_server/app.py to prevent masking internal failures as 400 client errors - Remove unnecessary sys.path.insert for drift_detection module - Remove duplicate exception classes in shared/security.py (use shared.exceptions) - Pin evidently, datasets, psutil to exact versions to mitigate supply chain risk
…used imports Addresses Copilot review recommendations from PR #57: - Fix PrismaService import path to core/database/postgres/prisma.service across 5 files (analytics handlers, alert repositories) - Fix analytics entity import paths (remove extra 'analytics' segment) - Remove unused HttpStatus import from health.controller.ts - Remove unused UseGuards import from payment.controller.ts - Fix alert-rule repository: rename findByRuleType to findByType and add userId parameter to match IAlertRuleRepository interface contract - Add IDOR security TODO on alert controller userId query params
…n, fix hint Addresses Copilot review recommendations from PR #58: - Make NODE_ENV required instead of defaulting to 'development' to prevent production deploys from silently bypassing security validations - Relax ENCRYPTION_KEY Joi validation from length(32) to min(32) to match EncryptionService runtime behavior (which slices to 32 chars) - Make assertNotInsecureDefault error hint context-aware: suggests 'openssl rand -hex 16' for ENCRYPTION_KEY, 'openssl rand -hex 32' for JWT
… log Addresses Copilot review recommendations from PR #59: - Fail-fast (throw) in production when CORS_ALLOWED_ORIGINS is empty instead of just logging a warning — prevents misconfigured deployments - Sanitize Origin header (strip CR/LF) before logging to prevent log-forging attacks via user-controlled Origin header values
PR#60 Copilot recommendations: - Redis healthcheck: pass -a REDIS_PASSWORD for authenticated health probe - Grafana: require GRAFANA_ADMIN_PASSWORD (remove insecure default) - Airflow DB: require AIRFLOW_DB_PASSWORD (remove insecure default) - Airflow healthcheck: use AIRFLOW_DB_USER env var instead of hardcoded user - Airflow admin: require AIRFLOW_ADMIN_PASSWORD (remove insecure default) - Secret scan CI: use base SHA instead of branch name for stable diff - Secret scan CI: align output filename to .secrets.baseline - Generate .secrets.baseline for pre-commit detect-secrets hook
PR#62 Copilot recommendations: - kafka.consumer.ts: call consumer.run() only once; subsequent subscribe() calls just register handlers — prevents KafkaJS duplicate-run error - correlation-id.middleware.ts: remove unused SpanStatusCode, context imports - docker-compose.yml: replace hardcoded MLflow DB password with env var refs - docker-compose.yml: alertmanager env-var expansion via sed templating - ci.yml: add --cov-fail-under=60 to ML pytest command - deploy.yml: gate production deploy on semver tag (startsWith 'v') - backend/package.json: raise coverage threshold from 50% to 70% - ml-services/Dockerfile: switch gunicorn CMD to shell form for WEB_CONCURRENCY - scripts/backup/backup-redis.sh: capture LASTSAVE before BGSAVE to fix polling loop that previously compared LASTSAVE to itself - k8s/backup-cronjob.yaml: use initContainers for sequential DB backups so a single failure does not block remaining databases
- ml/app.py: catch ValueError alongside ValidationError to return 400 not 500 - ml/security.py: correct misleading comment referencing non-existent exception classes - backup-redis.sh: use REDISCLI_AUTH env var instead of -a flag (prevents credential leak in process list) - backup-redis.sh: add configurable timeout (REDIS_BGSAVE_MAX_WAIT_SECONDS) to BGSAVE poll loop - kafka.consumer.ts: set isRunning=true only after consumer.run() succeeds; reset on error - docker-compose.yml: fix alertmanager sed templating (single-quoted patterns, shell-expanded replacements) - deploy.yml: fix production gate condition (head_branch=='main' not semver tag check) - backup-cronjob.yaml: correct misleading initContainers comment about failure isolation - Dockerfile (ml): use exec-form CMD so gunicorn is PID 1 and receives SIGTERM correctly
…s gitignored) - Align CI pnpm version with Dockerfiles (9.15.4) - Switch from --frozen-lockfile to --no-frozen-lockfile since pnpm-lock.yaml is in .gitignore and not committed to the repo - Update cache-dependency-path from pnpm-lock.yaml to package.json so the pnpm cache step does not fail on a missing file
d9daa23 to
8a1d288
Compare
…tity errors, backend eslintrc, backend ESLint config
- ML Services: run ruff --fix and --unsafe-fixes (107+16 auto-fixed), manually fix remaining 16:
- drift_monitor.py: replace star imports with explicit evidently.tests imports
- model_server/app.py: bare except -> except Exception (E722)
- model_server/health.py: bare except -> except Exception (E722)
- test files: add # noqa: E402 for intentional out-of-order imports
- Frontend: fix all ESLint errors
- NetWorthCard.tsx: replace 5x any with proper typings, unknown catch
- GoalProgressWidget.tsx: remove unused TrendingUp import, unknown catch
- ChatMessages.tsx: type history as Omit<Message,'timestamp'> & {timestamp:string}
- login/page.tsx: escape apostrophe in Don't
- forgot-password/page.tsx: escape apostrophe in we'll, unknown catch
- Backend: create missing .eslintrc.js (standard NestJS ESLint config)
All Copilot review recommendations from PRs #35-62 are also observed and implemented.
…nhance LLM tools - Add AgentBootstrapService for dynamic agent tool registration - Add GetFinancialContextHandler for financial graph query pipeline - Enhance GenerateLlmResponseTool with structured output support - Update QueryFinancialGraphTool with improved Cypher validation - Refactor AgentController with new query endpoints - Wire new services and handlers in AiReasoningModule - Fix SimulateScenarioHandler to use updated tool interfaces
…zation service - Add BudgetAlertHandler for event-driven budget threshold notifications - Add BudgetRepository infrastructure layer with Prisma integration - Add MlCategorizationService bridging financial data to ML-API - Update CreateBudgetHandler and UpdateBudgetHandler with domain events - Refactor FinancialDataModule to wire ML categorization pipeline - Enhance BudgetManagementModule with new event and infra providers - Update AlertController with pagination and improved error handling
…l server - Refactor RunSimulationHandler with proper domain event publishing - Enhance SimulationEntity with validation and status lifecycle methods - Update SimulationController with improved DTO validation and responses - Update ML model server app.py with Ollama integration endpoints - Add health and readiness probe endpoints to model server - Improve error handling and structured logging across simulation flow
…ride, frontend Dockerfile and startup script
… Services lint - backend: fix no-loss-of-precision (999999999999999999->999999999999999 in test.helper.ts) - backend: remove banned @ts-nocheck from websocket-streaming.e2e.spec.ts - backend: fix parsing error - unescaped apostrophe in monitoring.agent.ts string literal - frontend: replace err:any with err:unknown + instanceof Error narrowing in AuthContext - frontend: add eslint-disable for legitimate any in websocket.ts callbacks and utils.ts debounce - frontend: replace Record<string,any> and any[] with unknown equivalents in types/index.ts - ci: remove heavy pip install -r requirements.txt from ml-lint job (ruff only needs Python)
- Fix never[] array type annotations in 5 tool/agent files - Fix wrong import paths (get-financial-context, analytics handlers, budget controller) - Fix orchestrator.agent.ts: payload destructuring, buildPromptWithContext, null guard, parallel steps, metadata type - Fix agent.controller.ts: proper AgentMessage construction, trace.steps, remove metadata access - Fix execution-plan-builder.service.ts: parallel field, metadata type - Fix agent-bootstrap.service.ts: typed Map constructor - Fix agent-learning.service.ts: metric field names, improvements[] type, export interface - Fix analytics.events.ts: super() calls with object arg - Fix IAlertRepository interface: add countByUserId, countActiveByUserId, countUnreadByUserId, optional pagination args - Fix alert.repository.ts: use Alert.fromPersistence(), add missing interface methods - Fix alert-rule.repository.ts: use AlertRule.fromPersistence() - Fix transaction.handlers.ts: import and cast TransactionCategory - Fix local-embedding.service.ts: pipeline type cast - Fix semantic-retriever.service.ts: scoredResults type cast - Fix tracing.ts: PeriodicExportingMetricReader cast - Fix tool-registry.service.ts: Error|null -> Error|undefined - Fix budget.repository.ts: add missing 3 constructor args - Fix query-financial-graph.tool.ts: extractRecords accepts any - Add ml-services/requirements-test.txt with lightweight test deps - Update CI ml-test job to use requirements-test.txt instead of full requirements.txt
Summary
Implements all unaddressed GitHub Copilot code review recommendations from PRs #35, #36, #57, #58, #59, #60, #61, and #62.
Changes by PR
PR #36 - ML Services
assertguards withRuntimeErrorin intent classifier and financial NERPR #57 - Backend
core/database/?core/database/postgres/)analyticssegment)findByRuleType?findByTypeto matchIAlertRuleRepositoryinterfaceHttpStatusandUseGuardsimportsPR #58 - Config Validation
NODE_ENVrequired (remove insecure default)ENCRYPTION_KEYfrom.length(32)to.min(32)PR #59 - CORS Hardening
PR #60 - Secrets Management
-a REDIS_PASSWORDfor authenticated probesGRAFANA_ADMIN_PASSWORD,AIRFLOW_DB_PASSWORD,AIRFLOW_ADMIN_PASSWORD.secrets.baselinefor pre-commit detect-secrets hookPR #61 - Dockerfile & Swagger
pnpm@9.15.4in Dockerfile and Dockerfile.dev (replace@latest)rag-pipeline?RAG PipelinePR #62 - Infrastructure & Observability
consumer.run()called multiple times in Kafka consumerSpanStatusCode,contextimports from correlation-id middleware--cov-fail-under=60to ML pytest commandWEB_CONCURRENCYsupportVerification