fix(db): unify state history tables#2957
Conversation
Signed-off-by: Hasan Khan <hasank@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
WalkthroughState-history tables are normalized to ChangesState History Unification
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/api-db/migrations/20260628120000_unify_state_history_tables.sql`:
- Around line 80-158: The retention triggers in
machine_state_history_keep_limit, network_segment_state_history_keep_limit,
vpc_prefix_state_history_keep_limit, dpa_interface_state_history_keep_limit,
ib_partition_state_history_keep_limit, power_shelf_state_history_keep_limit,
rack_state_history_keep_limit, and switch_state_history_keep_limit can race
under concurrent inserts for the same object_id. Add per-object serialization
inside each trigger body before the DELETE/overflow calculation, using a lock or
equivalent mechanism keyed by NEW.object_id, so only one transaction performs
retention cleanup for a given object at a time. Keep the existing 250-row cap
logic, but ensure it runs after the lock is acquired.
In `@crates/api-db/src/dpa_interface.rs`:
- Around line 309-310: The history JSON built in the DPA interface query is not
ordered deterministically, so align the aggregate in the history_agg subquery
with state_history::for_object by preserving a stable sort before json_agg.
Update the query around dpa_interface_state_history and the
json_agg/json_build_object expression so include_history=true returns history
entries in the same id ASC order every time.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d93c359a-59de-4c3f-a8d7-095f96b2f97d
📒 Files selected for processing (14)
crates/api-core/src/handlers/power_shelf.rscrates/api-core/src/handlers/rack.rscrates/api-core/src/handlers/switch.rscrates/api-core/src/tests/machine_history.rscrates/api-core/src/tests/power_shelf.rscrates/api-core/src/tests/rack_find.rscrates/api-core/src/tests/switch.rscrates/api-core/src/tests/vpc_prefix.rscrates/api-db/migrations/20260628120000_unify_state_history_tables.sqlcrates/api-db/src/dpa_interface.rscrates/api-db/src/network_segment.rscrates/api-db/src/sql/machine_snapshot_history_join.snippetcrates/api-db/src/sql/managed_host_history_join.snippetcrates/api-db/src/state_history.rs
Signed-off-by: Hasan Khan <hasank@nvidia.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/api-db/src/state_history.rs (1)
236-288: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winJoin the spawned writer even when the wait assertion fails.
timeout(...)?can return beforesecond_insert.await, which drops theJoinHandleand detaches the DB task on failure. Capture the wait result, release the first transaction, await the spawned writer, then propagate the original error.Proposed cleanup ordering
- tokio::time::timeout(std::time::Duration::from_secs(5), async { + let wait_result = tokio::time::timeout(std::time::Duration::from_secs(5), async { loop { let waiting: bool = sqlx::query_scalar( @@ - }) - .await - .map_err(|_| { + }) + .await + .map_err(|_| { std::io::Error::other(format!( "second writer did not wait for {table_name} retention lock", )) - })??; + }); drop(observer); first_txn.commit().await?; - second_insert.await?.map_err(std::io::Error::other)?; + let second_result = second_insert.await?.map_err(std::io::Error::other); + wait_result??; + second_result?;As per coding guidelines, avoid spawning background tasks without joining them; as per path instructions, prioritize concurrency and resource-lifetime findings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/api-db/src/state_history.rs` around lines 236 - 288, The spawned writer task in the state history test is being detached when the wait assertion times out, because `timeout(...)?` returns before `second_insert.await` and the `JoinHandle` is dropped. Update the flow around `second_insert`, `pid_receiver`, and `first_txn.commit` so you capture the wait result first, always release/commit the first transaction, then await `second_insert` before returning or propagating the original error. Keep the original timeout/assertion error, but ensure the spawned DB task is joined on both success and failure paths.Sources: Coding guidelines, Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/api-db/src/state_history.rs`:
- Around line 236-288: The spawned writer task in the state history test is
being detached when the wait assertion times out, because `timeout(...)?`
returns before `second_insert.await` and the `JoinHandle` is dropped. Update the
flow around `second_insert`, `pid_receiver`, and `first_txn.commit` so you
capture the wait result first, always release/commit the first transaction, then
await `second_insert` before returning or propagating the original error. Keep
the original timeout/assertion error, but ensure the spawned DB task is joined
on both success and failure paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 14ac5f81-b88e-419a-8f42-4d898de2967e
📒 Files selected for processing (3)
crates/api-db/migrations/20260628120000_unify_state_history_tables.sqlcrates/api-db/src/dpa_interface.rscrates/api-db/src/state_history.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/api-db/src/dpa_interface.rs
- crates/api-db/migrations/20260628120000_unify_state_history_tables.sql
|
@coderabbitai review |
✅ Action performedReview finished.
|
Signed-off-by: Hasan Khan <hasank@nvidia.com>
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
Description
Normalize all eight state-history tables around a common five-column layout with an
object_id TEXTresource key.This change:
Related issues
Closes #1136
Type of Change
Breaking Changes
Testing
Completed validation:
cargo test --locked -p carbide-api-db -- --test-threads=1— 217 passed; doc tests 3 passed, 1 ignored.cargo test --locked -p carbide-api-db dpa_interface::test::deleting_ -- --nocapture --test-threads=1— 2 passed.cargo test --locked -p carbide-api-core test_force_delete_ -- --nocapture --test-threads=1— 9 passed.machine_historyfilter — 2 passed.cargo clippy --locked -p carbide-api-db -p carbide-api-core --all-targets --all-features— passed.cargo fmt --all -- --check— passed.origin/main: seeded all eight legacy tables, including NULL power-shelf/switch timestamps, then applied this migration. All eight rows were preserved; resource IDs becameTEXT; timestamps were non-null; every table had the common five-column layout and zero foreign keys.Additional Notes
The migration renames internal history columns in place. As with existing direct-rename migrations in this repository, old application binaries are not compatible with the post-migration column names during the rollout window.