Skip to content

Cloud gateway v next#87

Open
shleder wants to merge 12 commits into
mainfrom
cloud-gateway-vNext
Open

Cloud gateway v next#87
shleder wants to merge 12 commits into
mainfrom
cloud-gateway-vNext

Conversation

@shleder

@shleder shleder commented May 29, 2026

Copy link
Copy Markdown
Owner
  • describe the user-visible or maintainer-visible change

  • list the main files or subsystems touched

  • explain the problem being solved

  • explain how the change fits the fail-closed stdio-first product shape

  • npm run verify:all

  • npm run demo:stdio if runtime or trust-gate behavior changed

  • npm run benchmark:stdio -- --json --output evidence.json if security claims, benchmark corpus, or cache behavior changed

  • npm run pack:dry-run && npm run pack:smoke if packaging, CLI surface, docs install commands, or release workflows changed

  • docs updated if claims, demos, release notes, or repo metadata changed

  • note any residual risks, unsupported claims, metrics changes, or follow-up work

shleder added 3 commits May 29, 2026 14:54
Snapshot of the newer cloud API gateway / MCP Trust-Gates firewall for
external architecture review. Published to a dedicated branch so GitHub
main (older) is not overwritten.

Scope:
- Cloud gateway src/ (auth, tenant isolation, SSRF filter + IP pinning,
  schema/AST/honeytoken/scope/preflight gates, per-tenant token bucket,
  optional AI jailbreak guard, dynamic policy, BYOT tool registry).
- PostgreSQL + pgvector data layer (reader/writer split, migrations,
  semantic + L2 cache, billing idempotency).
- Stripe billing (checkout, portal, webhook with HMAC + replay window +
  idempotency), Resend email, SIEM streamer, Prometheus metrics.
- Fly.io + Docker deployment, monitoring stack, compatibility layer
  (OpenAI/Anthropic), workspaces (langchain, vercel-ai, dashboard, portal).
- AI knowledge base under docs/ai-context/ and Kiro steering under
  .kiro/steering/.

Excluded (secrets/artifacts): all .env* except .env.example, logs,
test-results, node_modules, local DB/cache, loose native binaries.

NOT production-ready: npm run assert:package-metadata fails (package.json
files[] omits dist/utils/child-env.*) and DB-dependent test suites were
not run locally (no DATABASE_URL). See docs/ai-context/CHANGELOG_FOR_AI.md.
…, prod blockers, docs)

- package metadata: remove dist/utils/child-env.{js,d.ts} from package.json
  files[] and lock them in scripts/assert-package-metadata.mjs forbiddenFiles.
  child-env is only used by the unpublished gateway-config / stdio paths and
  is not part of the published lib.js surface. `npm run assert:package-metadata`
  and `npm run verify:all` are now GREEN.
- prod boot guard: add validateProductionDatabaseUrl() in src/index.ts. When
  NODE_ENV=production and neither DATABASE_URL nor MASTER_DATABASE_URL is set,
  the process refuses to start. Defense-in-depth: /health returns 503 (not
  healthy) in production when the DB is unconfigured (no serving from
  in-memory stores).
- prod blocker (documented, not fixed): Postgres TLS rejectUnauthorized:false
  in src/database/postgres-pool.ts annotated with TODO(vNext, prod-blocker);
  not claimed secure.
- prod blocker (documented, not fixed): trust proxy 'loopback' in src/index.ts
  annotated with TODO(vNext, prod-blocker) for Fly/edge deployments.
- docs: PROJECT_SNAPSHOT.md + CHANGELOG_FOR_AI.md now reflect branch
  cloud-gateway-vNext, clean tree, successful push; the stale ABORTED release
  event is superseded by a successful publish event; critical-file hashes
  recomputed. SECURITY_AUDIT.md F-01/F-02 marked with vNext status.

Verification: assert-package-metadata PASS, typecheck PASS, build PASS,
test 467 passed / 3 skipped, verify:all GREEN. NOT production-ready: see
production_blockers in CHANGELOG_FOR_AI.md. main untouched.
@vercel

vercel Bot commented May 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
toolwall Ready Ready Preview, Comment May 29, 2026 7:24pm
toolwall-cvep Ready Ready Preview, Comment May 29, 2026 7:24pm

shleder added 2 commits May 29, 2026 15:44
…ked CI

Fixes the two highest-risk production blockers on cloud-gateway-vNext.

F-01 Postgres TLS certificate verification:
- Add resolvePostgresTls() (testable, fail-closed). Non-local DBs now use
  rejectUnauthorized:true with optional CA from PG_CA_CERT (inline PEM) or
  PGSSLROOTCERT (file path), else the system CA store. Production rejects
  sslmode=disable and PG_TLS_INSECURE at config time. Local dev/test
  (localhost) stays no-TLS. Never logs URL/password/CA contents.
- Tests: tests/postgres-tls.test.ts.

F-02 reverse-proxy / client-IP trust:
- Add src/config/proxy-trust.ts: resolveTrustProxySetting() drives
  app.set('trust proxy', ...) and FAILS LOUD in production when
  MCP_TRUST_PROXY is unset/"true"/garbage. fly.toml sets MCP_TRUST_PROXY=1.
- Color-boundary state now keyed by buildColorBoundaryKey (tenant-namespaced,
  never raw IP alone) in both the middleware and the dispatcher, so two
  tenants behind one proxy IP cannot share boundary state.
- HTTP_REQUEST audit now records clientIp + proxyIp.
- Tests: tests/proxy-trust.test.ts (incl. Express XFF integration).

DB-backed CI:
- Add .github/workflows/ci-db.yml: runs the full suite against
  pgvector/pgvector:pg16 with DATABASE_URL set, creates the vector
  extension, and fails if DB-dependent suites self-skip.

Docs: SECURITY_AUDIT (F-01/F-02 FIXED), RUNTIME_AND_DEPLOYMENT (TLS +
proxy + boot guards), TESTING_GAPS (validation tiers), PROJECT_SNAPSHOT
(blockers), CHANGELOG_FOR_AI (hashes + release event #3). .env.example
documents the new knobs.

No feature work. SSRF/tenant-isolation/auth/rate-limit/schema/cache-poison
protections unchanged. Local verify:all GREEN (24 suites, 499 passed,
3 skipped). DB-backed path runs in CI. main untouched.
Replace the placeholder/stale SHA in CHANGELOG_FOR_AI.md with the actual
HEAD of the TLS + reverse-proxy hardening commit. No code change.
shleder added 2 commits May 29, 2026 16:04
Ledger (docs/ai-context/CHANGELOG_FOR_AI.md):
- git_commit_head and release_event_3.new_head now equal the actual
  branch HEAD (e281ff1), replacing the stale f9426c5.
- record explicit commit list: f9426c5 (TLS/proxy hardening) +
  e281ff1 (ledger SHA precision).
- record the DB-backed CI observation on e281ff1 and the guard fix.

CI (.github/workflows/ci-db.yml):
- The DB suites ran AND passed against pgvector on run 26638123393,
  but the final visibility-guard step false-failed: it grepped
  `jest --verbose` run output for "tests/<suite>.test.ts", a string
  jest does not emit verbatim.
- Replace it with a deterministic `jest --listTests` enumeration
  (respects testPathIgnorePatterns, prints absolute paths) run BEFORE
  the suite, proving the DB suites are in the run set when
  DATABASE_URL is set. The run step is now a plain `npm test`.

No application source changed. main untouched.
shleder added 2 commits May 29, 2026 20:06
snapshot.git_commit_head was stale (8b00727 while branch tip was dce23a3). Add release_event_5 recording the actual DB-backed CI conclusion on the current HEAD (run 26638935114 = FAILURE: guard fix worked, ~15 DB suites genuinely fail) and correct event_4's unverified 'green' prediction. Explicitly record f9426c5 (TLS/proxy hardening) and e281ff1 (ledger SHA precision) in the commit ledger. Docs-only; no application logic changed; main untouched.
Trailing 1-line head-sync commit (same pattern as dce23a3->8b00727). Sets snapshot.git_commit_head to the substantive ledger commit a293778 and updates self_hash_prefix to F45E65EED9D9E4B4. Docs-only; main untouched.
shleder added 2 commits May 29, 2026 20:15
…failure)

Upgrade event_5 ci_trigger_decision from prediction to verified fact: the docs-only push (a293778+a422e71) triggered run 26651124794 on a422e71 which concluded FAILURE (no-DB gate green, DB-integration red) - same root cause, as expected for a docs-only change. Below the self-hash marker, so pre-marker self-hash is unchanged. Docs-only; main untouched.
Trailing 1-line head-sync commit. Sets snapshot.git_commit_head=24046c2 (the verified-CI-rerun ledger commit), appends a293778+24046c2 to git_commit_base lineage, and updates self_hash_prefix to C0A47AEFC142D57F. Docs-only; main untouched.
Migrate ~15 DB-backed test suites from the removed SQLite path (Phase 39 SQLite->Postgres) so they run correctly under DB-backed CI. The suites self-skipped locally (no DATABASE_URL) which hid the breakage; CI runs them against pgvector and they failed at load/assert time.

Changes (tests only; no application source touched):

- Replace deleted imports ../src/database/sqlite-pool.js and ../src/cache/semantic-store-sqlite.js with the Postgres API (postgres-pool / semantic-store-postgres).

- Await now-async calls (key-registry, tiers, rate-limiter, pending-checkouts, metrics aggregator, semantic store) consumed synchronously.

- Use describeWithDb + setupDbHarness for DB-touching suites (self-skip without DATABASE_URL, run migrations + truncate in CI) instead of initializePersistentStores/MCP_DB_MEMORY.

- semantic-caching: migrate to pgvector API (findSemanticHit miss => undefined; isSemanticCacheEnabled() no-arg; drop removed save-cap options).

- production-seeding: markerDir API; drop SQLite-only resolveDbFile/.sqlite assertions (no Postgres equivalent).

- production-email: replace SQLite pool-flush shutdown test with a Postgres-path graceful-shutdown assertion.

- tier-rate-limiting/token-bucket: await async store calls; seed permissive policy so dispatcher integration tests assert tier/bucket behaviour without fail-closing at the policy gate.

No security invariant weakened: tenant isolation, auth 401s, billing replay/idempotency, signature refusal, revocation, and semantic-cache tenant isolation assertions are all preserved. DB-suite visibility guard in ci-db.yml unchanged and still valid.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant