Skip to content

[STORY] Transition cleanup: remove launch keys from config.json (next release) #1196

Description

@jsbattig

Part of: #1194

Story: Transition cleanup: remove launch keys from config.json (next release)

Feature: Web UI + Transition Cleanup

Part of: #EPIC

[Conversation Reference: Proposal plans/designs/launch-config-cluster-propagation-proposal.md §9 (critical back-compat + C-A5 + MAJOR-5: this release keeps the four keys in config.json via TWO mechanisms — the strip-guard AND the write-path inclusion; a later release REMOVES BOTH and strips them once all nodes are new), §5/§7/§10 (two-snapshot APPLY/DEPLOY value precedence; DEPLOY NEVER reads launch.json/TARGET — MAJOR-M1; log_level is in-process NOT ExecStart — CRITICAL-A; MINOR-8 ServerConfig port-8000 default), §12 Story 6 (FIX-1: post-cleanup DEPLOY = applied_launch.json → parse/preserve the current live ExecStart → ServerConfig defaults, NEVER launch.json; drop the applied-worker-count resolver's config.json fallback in lockstep — Codex Major + Opus M1). PASS-5 FIX-1: Story 6 must NOT re-add launch.json/TARGET to the DEPLOY chain and must drop the resolver's config.json rung together with the config.json launch copies.]

Story Overview

Objective: After all nodes have upgraded past the back-compat window (this is a NEXT-RELEASE follow-up), stop writing the workers/log_level/host/port copies in config.json by REMOVING BOTH Story 1 transition mechanisms (MAJOR-5): (a) the C-A5 strip-guard (TRANSITION_PRESERVE_KEYS allow-list) so the existing _strip_config_file_to_bootstrap() now strips those four keys, AND (b) the config.json WRITE-path inclusion in save_config() so normal saves no longer write the four keys; and drop the readers' config.json fallback for them — so the shared DB / launch.json / applied_launch.json is the sole source for these four launch settings. CRITICALLY (FIX-1): removing the config.json rung must NOT re-add launch.json/TARGET to the DEPLOY chain (a routine deploy still calls the ensure path at deployment_executor.py:2890, so falling through to TARGET would silently apply a saved-but-unconfirmed launch change — re-opening decision #3 / MAJOR-M1). The post-cleanup DEPLOY chain is applied_launch.json → parse/preserve the current live ExecStart → ServerConfig defaults — NEVER launch.json. And the applied-worker-count resolver's config.json fallback (Story 1) is dropped in lockstep: post-cleanup resolver = applied_launch.json.workersServerConfig default (1).

User Value: Removes the transition redundancy and the risk of a stale config.json fallback diverging from shared state, completing the migration to a single source of truth — without re-opening the over-apply bug (DEPLOY never reads TARGET) and without the worker-count resolver silently reading a stale config.json after the copies are gone.

Acceptance Criteria Summary: The four launch keys are no longer written to config.json (BOTH the strip-guard AND the write-path inclusion are removed); _strip_config_file_to_bootstrap() strips them; readers no longer depend on them being there; the auto-updater's value resolution degrades to the two-snapshot precedence WITHOUT the config.json launch fallback and WITHOUT re-introducing TARGET into DEPLOY (APPLY: launch.jsonServerConfig defaults; DEPLOY: applied_launch.json → parse/preserve the current live ExecStart → ServerConfig defaults — NEVER launch.json); the applied-worker-count resolver drops its config.json rung in lockstep (applied_launch.json.workersServerConfig default 1); bootstrap-only keys remain in config.json. Two FIX-1 tests prove (a) a saved TARGET is NOT applied by a DEPLOY after cleanup, and (b) a node missing applied_launch.json falls to the default 1, not to config.json.

Acceptance Criteria

AC1: BOTH Story 1 transition mechanisms are removed; the four keys are stripped from and no longer written to config.json (MAJOR-5)

Scenario: A new-release node boots and runs initialize_runtime_db()_strip_config_file_to_bootstrap(), and later performs a normal settings-save.

Given all nodes are on the new release (back-compat window has passed)
And the Story 1 TRANSITION_PRESERVE_KEYS strip-guard has been removed
And the Story 1 config.json WRITE-path inclusion has been removed
When initialize_runtime_db(db_path) runs and calls _strip_config_file_to_bootstrap()
Then config.json does NOT contain workers/log_level/host/port copies (now stripped)
And config.json still contains the bootstrap-only keys (server_dir, postgres_dsn, storage_mode, cluster, ontap, node_id)
When a subsequent normal save_config() runs
Then config.json STILL does NOT contain workers/log_level/host/port (the write-path no longer includes them)

Technical Requirements:

  • REMOVE the Story 1 C-A5 transition allow-list (TRANSITION_PRESERVE_KEYS = {workers, log_level, host, port}) from config_service.py so _strip_config_file_to_bootstrap() (config_service.py:2348) once again strips every non-BOOTSTRAP_KEYS key — which now includes the four launch keys (Proposal §9 C-A5 "Story 6 removes BOTH and strips them").
  • REMOVE the Story 1 config.json WRITE-path inclusion (MAJOR-5): save_config() (config_service.py:2466) on BOTH the PG path (config_service.py:2475-2476) and the SQLite path (config_service.py:2479-2480) must write only _extract_bootstrap_dict(config) again (no TRANSITION_PRESERVE_KEYS splice), so normal saves stop writing the four keys to config.json.
  • Bootstrap-only keys (server_dir, postgres_dsn, storage_mode, cluster, ontap, cluster.node_id) remain in config.json (Proposal §3.4, §9, §13).

AC2: Readers no longer depend on config.json launch copies (two-snapshot precedence; DEPLOY NEVER reads launch.json — FIX-1 / MAJOR-M1)

Scenario: The runtime path and the auto-updater resolve launch values without config.json copies, and DEPLOY still never reads TARGET.

Given config.json has no workers/log_level/host/port copies
When get_config() resolves the four launch settings
Then it reads them from the shared runtime row (cluster) / runtime DB (solo)
When _ensure_launch_config() resolves the launch values in APPLY mode
Then it reads host/port/workers from launch.json, falling back to the ServerConfig defaults (NOT to config.json launch copies)
When _ensure_launch_config() resolves the launch values in DEPLOY mode
Then it reads host/port/workers from applied_launch.json, falling back to parsing/preserving the CURRENT live-unit ExecStart, then to ServerConfig defaults
And DEPLOY mode NEVER reads launch.json / TARGET (FIX-1 / MAJOR-M1) — removing the config.json rung does NOT re-add launch.json
And in NO mode is log_level written into ExecStart (it is in-process — CRITICAL-A)

Technical Requirements:

  • Remove the config.json fallback for the four launch settings from the runtime read path (Story 1 retained it; this story removes it). log_level continues to be read in-process from launch.json (Story 2 AC5); its config.json fallback is also removed here (the in-process read degrades launch.json → default, dropping the config.json launch fallback).
  • Update _ensure_launch_config (deployment_executor.py) value precedence to DROP the config.json launch fallback added in Story 3, WITHOUT re-introducing launch.json/TARGET into DEPLOY (FIX-1 / MAJOR-M1):
    • APPLY: launch.jsonServerConfig defaults.
    • DEPLOY: applied_launch.json → parse/preserve the CURRENT live-unit ExecStart (keep its existing --host/--port/--workers) → ServerConfig defaults. NEVER launch.json/TARGET. (This mirrors Story 3's MAJOR-M1 corrupt-applied "preserve the live ExecStart" rule; after cleanup, the first-boot/no-APPLIED-yet rung becomes "preserve the live ExecStart" instead of config.json, because the config.json copy no longer exists and falling through to TARGET is forbidden.)
  • The canonical defaults remain the ServerConfig literals (host="127.0.0.1", port=8000, workers=1config_manager.py:1123-1125; MINOR-8 port 8000 not 8090). --log-level is NOT in ExecStart (CRITICAL-A).

AC3: The applied-worker-count resolver drops its config.json fallback in lockstep (FIX-1)

Scenario: After cleanup, the governor/cache worker-count resolver no longer has a config.json rung; a node missing applied_launch.json falls to the default 1.

Given the config.json launch copies have been removed (AC1)
And the Story 1 applied-worker-count resolver previously had chain applied_launch.json.workers -> config.json -> 1
When the resolver is invoked and applied_launch.json records workers=4
Then it returns 4 (APPLIED, unchanged)
When the resolver is invoked and applied_launch.json does NOT exist (or has no workers)
Then it returns the ServerConfig default 1
And it does NOT read config.json (the config.json rung has been removed in lockstep with the launch-copy removal)

Technical Requirements:

  • FIX-1: drop the config.json rung from the applied-worker-count resolver in the SAME release as the config.json launch-copy removal. Story 1 introduced the resolver as applied_launch.json.workers → config.json → ServerConfig default 1; after Story 6 removes the config.json launch copies, the post-cleanup resolver chain is applied_launch.json.workers → ServerConfig default 1 (no config.json rung).
  • Keep the resolver DB-free and fail-soft (max(1, value), default 1 on absent/error), unchanged except for removing the config.json rung.
  • The governor (provider_concurrency_governor.py:346-364) and cache initializer (service_init.py:124-144) continue to consume the resolver (NOT get_config().workers), unchanged except that the resolver no longer reads config.json.

AC4: No regression for solo or cluster

Scenario: Solo and cluster deployments operate after the cleanup.

Given a solo deployment on the new release with launch values in the runtime DB + launch.json
When the server starts and an operator applies a restart
Then host/port/workers apply correctly without any config.json launch copy
And log_level applies in-process from launch.json without any config.json launch copy
Given a cluster deployment on the new release
When an admin edits a launch setting and applies a restart
Then it propagates and applies cluster-wide without any config.json launch copy
And a routine code deploy (no diagnostics restart) does NOT apply the saved-but-unconfirmed change (DEPLOY reads applied_launch.json, never launch.json)

Technical Requirements:

  • Verify end-to-end solo + cluster correctness with no config.json launch copies present (Proposal §8, §9).
  • No new table, no migration, no DROP/RENAME (Proposal §6, §13) — this is purely removing the file-copy write + the strip-guard + the fallback read + the resolver's config.json rung.

AC5: Release-gating guard

Scenario: This cleanup must not ship in the same release as Stories 1-5.

Given Stories 1-5 shipped in release R
When this cleanup is scheduled
Then it ships no earlier than release R+1 (after the back-compat window, all nodes new)

Technical Requirements:

  • This story is explicitly a NEXT-RELEASE follow-up; it MUST NOT ship in the same release as Stories 1-5 (Proposal §9 back-compat window). Record the release-gating in the story so it is not implemented prematurely. Removing EITHER transition mechanism (the strip-guard OR the write-path inclusion) before all nodes are new would strip the old-node fallback prematurely — the exact failure C-A5 / MAJOR-5 guard against.

Technical Implementation Details

Component Structure

src/code_indexer/server/services/config_service.py
  - REMOVE the Story 1 TRANSITION_PRESERVE_KEYS strip-guard so _strip_config_file_to_bootstrap (:2348) strips the four keys
  - REMOVE the Story 1 config.json WRITE-path inclusion in save_config (:2466) on BOTH the PG path (:2475-2476)
    and the SQLite path (:2479-2480) so normal saves write only _extract_bootstrap_dict again (MAJOR-5)
  - remove config.json fallback for these four from the runtime read path (incl. the in-process log_level read)
src/code_indexer/server/auto_update/deployment_executor.py
  - _ensure_launch_config value precedence: drop config.json launch fallback WITHOUT re-adding launch.json to DEPLOY (FIX-1)
      APPLY:  launch.json -> ServerConfig defaults
      DEPLOY: applied_launch.json -> parse/preserve the CURRENT live-unit ExecStart -> ServerConfig defaults
              (NEVER launch.json/TARGET — FIX-1 / MAJOR-M1)
    (log_level NOT in ExecStart — CRITICAL-A; port 8000 default — MINOR-8)
Applied-worker-count resolver (Story 1) — FIX-1:
  - drop the config.json rung: post-cleanup chain = applied_launch.json.workers -> ServerConfig default 1
Bootstrap-only keys in config.json remain untouched.

Testing Requirements

Unit Test Coverage (write first — TDD)

  • After removing the strip-guard, _strip_config_file_to_bootstrap() strips the four launch keys from config.json; bootstrap keys remain (AC1).
  • After removing the write-path inclusion, a normal save_config() no longer writes the four launch keys to config.json on EITHER the PG or SQLite path (AC1, MAJOR-5).
  • get_config() resolves the four from the runtime DB only; _ensure_launch_config APPLY resolves from launch.json → ServerConfig defaults only; DEPLOY resolves from applied_launch.json → parse/preserve the live ExecStart → ServerConfig defaults only; no config.json launch fallback in any mode; no log_level in ExecStart (AC2).
  • FIX-1 (a) — DEPLOY never applies a saved TARGET after cleanup: with a saved-but-unconfirmed launch change present in the DB/launch.json (TARGET = workers 8) and applied_launch.json recording workers=4, a DEPLOY run after cleanup rewrites ExecStart from applied_launch.json (workers=4), NEVER from launch.json (workers=8); the saved TARGET is NOT applied. Also: with applied_launch.json ABSENT after cleanup, DEPLOY parses/preserves the current live ExecStart (NOT TARGET) (AC2).
  • FIX-1 (b) — resolver drops config.json after cleanup: with applied_launch.json ABSENT, the applied-worker-count resolver returns the ServerConfig default 1, NOT a value from config.json (proves the config.json rung was removed in lockstep) (AC3).
  • Resolver still returns the APPLIED count when applied_launch.json is present (workers=4 → 4) (AC3) — regression.
  • Solo + cluster correctness with no config.json launch copies (AC4).
  • Release-gating documented/guarded (AC5).

Integration / Server Test Coverage

  • Server code touched → run server-fast-automation.sh (zero failures).

Staging / E2E

  • Full solo + cluster flow without config.json launch copies — validated in e2e-automation.sh and/or staging (Proposal §8, §9). Confirm a routine deploy after a saved-but-unconfirmed change does NOT apply it (DEPLOY reads applied_launch.json). Front-door rule applies.

Front-Door Rule

  • Server-behavior verification goes through the REST/MCP front door; CLI/SSH only for inspecting config.json/launch.json/applied_launch.json/unit files during troubleshooting.

Definition of Done

Functional Completion

  • AC1-AC5 satisfied with evidence (failing tests first).
  • BOTH Story 1 transition mechanisms removed (strip-guard AND write-path inclusion, MAJOR-5); config.json no longer carries the four launch copies; readers no longer depend on them.
  • FIX-1: _ensure_launch_config value precedence updated to the two-snapshot model with no config.json launch fallback AND no re-introduction of launch.json/TARGET into DEPLOY (APPLY launch.json→defaults; DEPLOY applied_launch.json→preserve-live-ExecStart→defaults, NEVER launch.json); log_level stays in-process.
  • FIX-1: applied-worker-count resolver's config.json rung dropped in lockstep (post-cleanup chain applied_launch.json.workers → default 1); proven a node missing applied_launch.json falls to 1, not config.json.

Quality Validation

  • ./lint.sh exits 0.
  • server-fast-automation.sh green.
  • Code review approved.
  • Manual verification evidence captured (solo + cluster apply without config.json launch copies; DEPLOY does not apply a saved TARGET; resolver falls to default 1 without applied_launch.json).

Integration Readiness

  • No new table/migration; no DROP/RENAME (Proposal §6, §13).
  • Ships in a later release than Stories 1-5 (Proposal §9).
  • Completes the single-source-of-truth migration for the four launch settings without re-opening the over-apply bug (DEPLOY never reads TARGET) and without a stale config.json worker-count rung.

Priority: Medium (next-release follow-up; not blocking the functional epic outcome delivered by Stories 1-5).
Dependencies: Stories 1-5 must be shipped and the back-compat window elapsed (all nodes upgraded) before this ships.
Success Metric: BOTH Story 1 transition mechanisms (strip-guard AND write-path inclusion) are removed and config.json no longer carries workers/log_level/host/port; the shared DB + launch.json + applied_launch.json are the sole source, with _ensure_launch_config degrading to the two-snapshot precedence (no config.json launch fallback, DEPLOY still NEVER reads launch.json/TARGET — FIX-1) and log_level in-process, and the applied-worker-count resolver dropping its config.json rung in lockstep — no regression in solo or cluster, and no re-opening of the over-apply bug.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions