Is this a new feature, an enhancement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
The Health checks provides a set of validation checks that can be executed directly on a node and return a pass/fail result along with diagnostic output.
- NICo will run the checks on the node
- NICo will read the output and make decisions
- The health check binary does not take any action itself — it only reports results
The goal is to integrate this capability into NICo in a simple and incremental way, starting with a narrow use case.
Feature Description
✅ In Scope
- Run Health checks directly on the node (in-band)
- Trigger execution from NICo during post-repair (break/fix) workflow
- Capture results:
a. Pass / Fail
b. Diagnostic output
Use results to decide: Whether node can return to fleet
❌ Out of Scope
- Replacing existing NICo validation logic
- Multi-node or cluster-level validation
- Out-of-band integration
- Continuous monitoring integration
- Deep integration into health check source code
Describe your ideal solution
-
Execute Health Checks
a. NICo triggers checks on the node
b. Execution happens in post-repair validation step
-
Capture and Parse Results
Read output:
a. Pass / Fail (exit code or equivalent)
b. Diagnostic details
Normalize results for NICo usage
-
Decision Handling in NICo
NICo evaluates:
a. Pass → Node is healthy → return to pool
b. Fail → Node remains in repair
Catalog only reports; NICo makes decisions
-
Workflow Integration
Integrate into:
a. Break/fix post-repair validation step
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct
Is this a new feature, an enhancement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
The Health checks provides a set of validation checks that can be executed directly on a node and return a pass/fail result along with diagnostic output.
The goal is to integrate this capability into NICo in a simple and incremental way, starting with a narrow use case.
Feature Description
✅ In Scope
a. Pass / Fail
b. Diagnostic output
Use results to decide: Whether node can return to fleet
❌ Out of Scope
Describe your ideal solution
Execute Health Checks
a. NICo triggers checks on the node
b. Execution happens in post-repair validation step
Capture and Parse Results
Read output:
a. Pass / Fail (exit code or equivalent)
b. Diagnostic details
Normalize results for NICo usage
Decision Handling in NICo
NICo evaluates:
a. Pass → Node is healthy → return to pool
b. Fail → Node remains in repair
Catalog only reports; NICo makes decisions
Workflow Integration
Integrate into:
a. Break/fix post-repair validation step
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct