Skip to content

fix(db): unify state history tables#2957

Open
osu wants to merge 3 commits into
NVIDIA:mainfrom
osu:fix/1136-unify-state-history-tables
Open

fix(db): unify state history tables#2957
osu wants to merge 3 commits into
NVIDIA:mainfrom
osu:fix/1136-unify-state-history-tables

Conversation

@osu

@osu osu commented Jun 28, 2026

Copy link
Copy Markdown
Member

Description

Normalize all eight state-history tables around a common five-column layout with an object_id TEXT resource key.

This change:

  • makes timestamps mandatory and backfills existing NULL values;
  • removes cascading resource foreign keys so history survives deletion;
  • updates shared accessors, direct joins, retention triggers, indexes, and machine cleanup;
  • retains power-shelf, rack, switch, DPA-interface, and machine-cleanup history;
  • adds schema, 250-record retention, deletion-retention, and DPA history-query coverage.

Related issues

Closes #1136

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Completed validation:

  • cargo test --locked -p carbide-api-db -- --test-threads=1 — 217 passed; doc tests 3 passed, 1 ignored.
  • cargo test --locked -p carbide-api-db dpa_interface::test::deleting_ -- --nocapture --test-threads=1 — 2 passed.
  • Linux: cargo test --locked -p carbide-api-core test_force_delete_ -- --nocapture --test-threads=1 — 9 passed.
  • Linux: machine_history filter — 2 passed.
  • Linux: exact VPC-prefix renamed-column test — 1 passed.
  • Linux: exact network-segment retention test — 1 passed.
  • Linux: cargo clippy --locked -p carbide-api-db -p carbide-api-core --all-targets --all-features — passed.
  • Pinned nightly cargo fmt --all -- --check — passed.
  • Populated-upgrade rehearsal from origin/main: seeded all eight legacy tables, including NULL power-shelf/switch timestamps, then applied this migration. All eight rows were preserved; resource IDs became TEXT; timestamps were non-null; every table had the common five-column layout and zero foreign keys.

Additional Notes

The migration renames internal history columns in place. As with existing direct-rename migrations in this repository, old application binaries are not compatible with the post-migration column names during the rollout window.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu osu requested a review from a team as a code owner June 28, 2026 18:40
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6ee1cded-e54f-4b90-8e92-669b134e8a15

📥 Commits

Reviewing files that changed from the base of the PR and between c46d99e and 219e815.

📒 Files selected for processing (1)
  • crates/api-db/src/state_history.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/api-db/src/state_history.rs

Summary by CodeRabbit

  • Bug Fixes

    • Force-deleting racks, switches, and power shelves now preserves their state history instead of removing it.
    • State history lookups and related history views now return consistent results across affected resources.
    • Cleanup and delete flows were adjusted so historical records remain available after removal.
  • Tests

    • Expanded coverage to verify state history is retained after force delete actions.
    • Added broader validation for history storage, retrieval, and retention behavior.

Walkthrough

State-history tables are normalized to object_id, timestamps are standardized, and retention triggers are rebuilt. Query helpers, SQL joins, and force-delete handlers are updated to use the unified key and preserve history across hard deletes. Tests were updated to validate the new behavior.

Changes

State History Unification

Layer / File(s) Summary
DB migration and retention triggers
crates/api-db/migrations/20260628120000_unify_state_history_tables.sql
Drops cascading foreign keys, normalizes timestamps, renames history key columns to object_id, standardizes indexes, recreates per-object retention triggers, and adds machine cleanup without deleting state history.
state_history API and storage helpers
crates/api-db/src/state_history.rs
The state-history module switches to fixed object_id queries and string binding, removes per-table object_id column selection and delete_by_object_id, and expands integration coverage for schema shape, persistence, lookup, renaming, retention, and concurrency.
History joins and DPA interface queries
crates/api-db/src/sql/machine_snapshot_history_join.snippet, crates/api-db/src/sql/managed_host_history_join.snippet, crates/api-db/src/network_segment.rs, crates/api-db/src/dpa_interface.rs
The machine, managed host, network segment, and DPA interface history queries now join on object_id, and DPA interface deletion no longer removes state-history rows before deleting the interface.
Force-delete handlers retain history
crates/api-core/src/handlers/power_shelf.rs, crates/api-core/src/handlers/rack.rs, crates/api-core/src/handlers/switch.rs
The power shelf, rack, and switch force-delete handlers remove explicit state-history deletion calls and update their documentation to state that history is retained.
Force-delete and SQL test updates
crates/api-core/src/tests/power_shelf.rs, crates/api-core/src/tests/rack_find.rs, crates/api-core/src/tests/switch.rs, crates/api-core/src/tests/machine_history.rs, crates/api-core/src/tests/vpc_prefix.rs
The force-delete tests seed retained history before deletion and verify it remains afterward, while the machine history and VPC prefix tests switch to object_id-based SQL and string parameters.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately reflects the main change: unifying state-history tables in the database.
Description check ✅ Passed The description is clearly about the same state-history unification work and related fixes.
Linked Issues check ✅ Passed The changes match #1136 by making timestamps mandatory, removing FK-based deletion, and standardizing object_id with 250-row retention.
Out of Scope Changes check ✅ Passed No clear unrelated code changes stand out; the migration, accessors, joins, and tests all support the stated state-history unification work.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/api-db/migrations/20260628120000_unify_state_history_tables.sql`:
- Around line 80-158: The retention triggers in
machine_state_history_keep_limit, network_segment_state_history_keep_limit,
vpc_prefix_state_history_keep_limit, dpa_interface_state_history_keep_limit,
ib_partition_state_history_keep_limit, power_shelf_state_history_keep_limit,
rack_state_history_keep_limit, and switch_state_history_keep_limit can race
under concurrent inserts for the same object_id. Add per-object serialization
inside each trigger body before the DELETE/overflow calculation, using a lock or
equivalent mechanism keyed by NEW.object_id, so only one transaction performs
retention cleanup for a given object at a time. Keep the existing 250-row cap
logic, but ensure it runs after the lock is acquired.

In `@crates/api-db/src/dpa_interface.rs`:
- Around line 309-310: The history JSON built in the DPA interface query is not
ordered deterministically, so align the aggregate in the history_agg subquery
with state_history::for_object by preserving a stable sort before json_agg.
Update the query around dpa_interface_state_history and the
json_agg/json_build_object expression so include_history=true returns history
entries in the same id ASC order every time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d93c359a-59de-4c3f-a8d7-095f96b2f97d

📥 Commits

Reviewing files that changed from the base of the PR and between d3395d8 and 9fd867f.

📒 Files selected for processing (14)
  • crates/api-core/src/handlers/power_shelf.rs
  • crates/api-core/src/handlers/rack.rs
  • crates/api-core/src/handlers/switch.rs
  • crates/api-core/src/tests/machine_history.rs
  • crates/api-core/src/tests/power_shelf.rs
  • crates/api-core/src/tests/rack_find.rs
  • crates/api-core/src/tests/switch.rs
  • crates/api-core/src/tests/vpc_prefix.rs
  • crates/api-db/migrations/20260628120000_unify_state_history_tables.sql
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/src/network_segment.rs
  • crates/api-db/src/sql/machine_snapshot_history_join.snippet
  • crates/api-db/src/sql/managed_host_history_join.snippet
  • crates/api-db/src/state_history.rs

Comment thread crates/api-db/src/dpa_interface.rs Outdated
Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu

osu commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/api-db/src/state_history.rs (1)

236-288: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Join the spawned writer even when the wait assertion fails.

timeout(...)? can return before second_insert.await, which drops the JoinHandle and detaches the DB task on failure. Capture the wait result, release the first transaction, await the spawned writer, then propagate the original error.

Proposed cleanup ordering
-        tokio::time::timeout(std::time::Duration::from_secs(5), async {
+        let wait_result = tokio::time::timeout(std::time::Duration::from_secs(5), async {
             loop {
                 let waiting: bool = sqlx::query_scalar(
@@
-        })
-        .await
-        .map_err(|_| {
+        })
+        .await
+        .map_err(|_| {
             std::io::Error::other(format!(
                 "second writer did not wait for {table_name} retention lock",
             ))
-        })??;
+        });
         drop(observer);
 
         first_txn.commit().await?;
-        second_insert.await?.map_err(std::io::Error::other)?;
+        let second_result = second_insert.await?.map_err(std::io::Error::other);
+        wait_result??;
+        second_result?;

As per coding guidelines, avoid spawning background tasks without joining them; as per path instructions, prioritize concurrency and resource-lifetime findings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/api-db/src/state_history.rs` around lines 236 - 288, The spawned
writer task in the state history test is being detached when the wait assertion
times out, because `timeout(...)?` returns before `second_insert.await` and the
`JoinHandle` is dropped. Update the flow around `second_insert`, `pid_receiver`,
and `first_txn.commit` so you capture the wait result first, always
release/commit the first transaction, then await `second_insert` before
returning or propagating the original error. Keep the original timeout/assertion
error, but ensure the spawned DB task is joined on both success and failure
paths.

Sources: Coding guidelines, Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/api-db/src/state_history.rs`:
- Around line 236-288: The spawned writer task in the state history test is
being detached when the wait assertion times out, because `timeout(...)?`
returns before `second_insert.await` and the `JoinHandle` is dropped. Update the
flow around `second_insert`, `pid_receiver`, and `first_txn.commit` so you
capture the wait result first, always release/commit the first transaction, then
await `second_insert` before returning or propagating the original error. Keep
the original timeout/assertion error, but ensure the spawned DB task is joined
on both success and failure paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 14ac5f81-b88e-419a-8f42-4d898de2967e

📥 Commits

Reviewing files that changed from the base of the PR and between 9fd867f and c46d99e.

📒 Files selected for processing (3)
  • crates/api-db/migrations/20260628120000_unify_state_history_tables.sql
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/src/state_history.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/api-db/src/dpa_interface.rs
  • crates/api-db/migrations/20260628120000_unify_state_history_tables.sql

@osu

osu commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 285 6 25 103 7 144
machine-validation-runner 748 30 189 272 36 221
machine_validation 748 30 189 272 36 221
machine_validation-aarch64 748 30 189 272 36 221
nvmetal-carbide 748 30 189 272 36 221
TOTAL 3283 126 781 1197 151 1028

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@osu osu self-assigned this Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify state history tables

1 participant