Skip to content

fix: refactor _validate_shellcheck to async subprocess execution to unblock FastAPI event loop#986

Open
the404packet wants to merge 9 commits into
rishabh0510rishabh:mainfrom
the404packet:main
Open

fix: refactor _validate_shellcheck to async subprocess execution to unblock FastAPI event loop#986
the404packet wants to merge 9 commits into
rishabh0510rishabh:mainfrom
the404packet:main

Conversation

@the404packet

@the404packet the404packet commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

The _validate_shellcheck function was using subprocess.run, a synchronous blocking call, to execute the external shellcheck binary. Under concurrent load, this blocked the main ASGI event loop, degrading API throughput and causing request timeouts.

This PR refactors the shellcheck execution to use asyncio.create_subprocess_exec, offloads the CPU-bound _validate_bash_ast (bashlex parsing) to a thread pool via run_in_executor, and makes validate_rendered_output fully async; eliminating the event loop blocking entirely.

Related Issues

Fixes #532

Changes Made

  • Refactored _validate_shellcheck to async using asyncio.create_subprocess_exec instead of subprocess.run
  • Replaced subprocess.TimeoutExpired handling with asyncio.wait_for timeout; added explicit process.kill() + process.wait() to prevent zombie processes on timeout
  • Made validate_rendered_output async; offloaded _validate_bash_ast to thread pool via loop.run_in_executor
  • Simplified the AI safety check block in validate_rendered_output — removed the sync/async loop-detection workaround since the function is now always awaited
  • Updated all callers to await validate_rendered_output(...):
    • backend/app/templates/engine.pyrender and render_all made async
    • backend/app/services/repair_service.pyrender_repair made async
    • backend/app/ai/service.py_validate_response_safety made async, both call sites updated
    • backend/app/api/v1/diagnose.py — 3 call sites updated
  • Updated all tests to use async def + await with pytestmark = pytest.mark.asyncio
  • Replaced monkeypatch(subprocess.run) in shellcheck tests with patch(asyncio.create_subprocess_exec) using AsyncMock

Verification

  • Added unit tests
  • Ran pytest tests/ successfully
  • Manually tested via the API / CLI
  • (If applicable) Generated scripts pass SafetyFilter

Documentation

  • Updated docs/FEATURES.md (if adding a feature/profile)
  • Updated CHANGELOG.md
  • Code is fully documented and type-hinted

Summary by CodeRabbit

  • Bug Fixes

    • Improved timeout handling for shell script validation to prevent system hangs.
    • Enhanced error detection and reporting in safety violation alerts.
  • Refactor

    • Optimized internal architecture of safety validation and template rendering processes for improved system stability and performance.
    • Streamlined validation pipeline for more responsive and reliable operations.
    • Enhanced logging and diagnostics for the validation workflow.

@vercel

vercel Bot commented Jun 11, 2026

Copy link
Copy Markdown

@the404packet is attempting to deploy a commit to the rishabhmishra0510-5147's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@the404packet, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 53 minutes and 25 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e0717a2e-b438-45fb-9d59-4896754d3f38

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd9291 and 3fc7900.

⛔ Files ignored due to path filters (1)
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (1)
  • frontend/package.json
📝 Walkthrough

Walkthrough

This PR converts the safety validation pipeline from synchronous to asynchronous execution to prevent event loop blocking during subprocess and validation operations. The core validate_rendered_output function is made async with async subprocess execution for shellcheck, thread-pooled AST parsing, and direct awaiting of AI audits. All downstream services, endpoints, and tests are updated to await these async operations.

Changes

Safety Validation & Service Layer Async Conversion

Layer / File(s) Summary
Safety validation async core infrastructure
backend/app/templates/safety.py
validate_rendered_output is converted to async def. Shell script validation now runs shellcheck via asyncio.create_subprocess_exec with 5s timeout enforcement and JSON output parsing. AST parsing is offloaded to run_in_executor to avoid blocking. AI auditing now directly awaits the LLM completion method. Module imports updated to support async subprocess and JSON handling.
Template rendering async conversion
backend/app/templates/engine.py
TemplateRenderer.render and render_all are converted to async methods. The render method now awaits validate_rendered_output. The render_all method loops through filenames, awaiting each self.render() call sequentially.
Repair service async integration
backend/app/services/repair_service.py
RepairService.render_repair is converted to async def. The safety-filter validation step awaits validate_rendered_output, while preserving the returned payload structure (template_id, filename, content, size_bytes).
AI troubleshooting service async conversion
backend/app/ai/service.py
_validate_response_safety is converted to async def and now awaits validate_rendered_output for root_cause, suggested fix fields, and safe commands. Both troubleshoot() and stream_troubleshoot() entry points now await this helper.
Diagnostic explain endpoint async conversion
backend/app/api/v1/diagnose.py
The /diagnose/explain endpoint now awaits validate_rendered_output for issue summary, root cause, and each suggested step. The python_version assignment is reformatted into a multiline conditional expression preserving the extraction logic.
Integration tests async conversion
backend/tests/integration/test_ai_pipeline.py
Module-level pytest.mark.asyncio enables async test execution. All repair and validation tests are converted to async def and await render_repair and validate_rendered_output. Template safety, prompt-to-repair flow, invalid template, and safety filter assertions remain functionally equivalent.
Unit tests async conversion and subprocess mocking
backend/tests/unit/templates/test_safety.py
All tests converted to async def with pytest.mark.asyncio. Shellcheck tests now mock asyncio.create_subprocess_exec instead of subprocess.run. New test_shellcheck_timeout_graceful test added. AST, URL whitelisting, and dynamic-argument tests updated to await validate_rendered_output while preserving blocked/allowed assertions.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant validate_rendered_output
  participant run_in_executor
  participant _validate_bash_ast
  participant _validate_shellcheck
  participant LLM_Client
  Caller->>validate_rendered_output: await validate_rendered_output(content, template_name, llm_client)
  validate_rendered_output->>run_in_executor: run_in_executor(_validate_bash_ast, content)
  run_in_executor->>_validate_bash_ast: parse and validate AST in thread
  _validate_bash_ast-->>run_in_executor: AST analysis result
  validate_rendered_output->>_validate_shellcheck: await _validate_shellcheck(content)
  _validate_shellcheck->>_validate_shellcheck: asyncio.create_subprocess_exec(shellcheck)
  _validate_shellcheck->>_validate_shellcheck: enforce 5s timeout, parse JSON output
  _validate_shellcheck-->>validate_rendered_output: SafetyViolationError or pass
  validate_rendered_output->>LLM_Client: await llm_client.complete(...) if enabled
  LLM_Client-->>validate_rendered_output: AI audit result
  validate_rendered_output-->>Caller: safe_content or raise SafetyViolationError
Loading
sequenceDiagram
  participant Caller
  participant render_all
  participant render
  participant validate_rendered_output
  Caller->>render_all: await render_all(output_filenames, context)
  loop for each filename
    render_all->>render: await render(filename, context)
    render->>render: template.render(context)
    render->>validate_rendered_output: await validate_rendered_output(content, template_name)
    validate_rendered_output-->>render: safe_content
    render-->>render_all: RenderResult(template_id, content, filename, size)
  end
  render_all-->>Caller: list of RenderResult
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • rishabh0510rishabh/EnvForage#468: Introduces the original validate_rendered_output safety phases (_validate_bash_ast and _validate_shellcheck) that this PR converts to async, with direct code-level overlap in the safety module structure.

Suggested labels

level:critical, level3, Hard

Suggested reviewers

  • rishabh0510rishabh

Poem

🐰 The rabbit hops through async streams so bright,
No blocking calls to dim the FastAPI light,
With shellcheck spawned and futures awaited true,
The event loop flows like morning dew!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: converting shellcheck validation to async subprocess execution to unblock the FastAPI event loop.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #532 by converting _validate_shellcheck to async using asyncio.create_subprocess_exec with proper timeout handling, and updating all call sites throughout the codebase.
Out of Scope Changes check ✅ Passed All changes are directly related to making validation async to unblock the event loop. Updates to render methods, service methods, and tests are necessary call-site changes required by the async validation conversion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@the404packet

Copy link
Copy Markdown
Contributor Author

@rishabh0510rishabh this pr is ready for review

the backend and frontend test failing are due to problems introduces by previous pr.

This problems should be addressed in new issue

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

🔍 PR Action Required

Hi @the404packet,

We detected some items on this Pull Request that require attention:

❌ Failing CI Checks

The following check runs or commit statuses are failing (ignoring vercel):

Please resolve the issues above to proceed.


Last updated: Thu, 11 Jun 2026 04:37:07 GMT

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/services/repair_service.py (1)

171-217: ⚠️ Potential issue | 🟠 Major

Update unit tests to await render_repair
render_repair is async def, but backend/tests/unit/ai/test_repair_service.py calls it without await (e.g., result = service.render_repair(template_id, params)), so result will be a coroutine and the assertions will fail. Update the affected tests to async def and await service.render_repair(...) (add the appropriate asyncio/pytest marker if needed).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/services/repair_service.py` around lines 171 - 217, Tests
currently call the async function render_repair without awaiting it, returning a
coroutine; update each affected test (in the repair service unit tests) to be
async def and await service.render_repair(...), and add the appropriate pytest
async support (e.g., apply `@pytest.mark.asyncio` or ensure pytest-asyncio is
enabled) so assertions receive the actual result instead of a coroutine.
🧹 Nitpick comments (1)
backend/tests/unit/templates/test_safety.py (1)

319-328: ⚡ Quick win

Consider asserting cleanup calls in timeout test.

The test correctly verifies graceful timeout handling (content is returned). To strengthen the test, you could also assert that process.kill() and process.wait() were called to verify the zombie-prevention cleanup documented in the PR objectives.

🧪 Optional enhancement to verify cleanup behavior
 async def test_shellcheck_timeout_graceful():
     mock_process = AsyncMock()
     mock_process.communicate = AsyncMock(side_effect=TimeoutError())
     mock_process.kill = MagicMock()
     mock_process.wait = AsyncMock()

     with patch("asyncio.create_subprocess_exec", return_value=mock_process):
         content = "echo 'safe content'"
         assert await validate_rendered_output(content, "setup.sh") == content
+        mock_process.kill.assert_called_once()
+        mock_process.wait.assert_awaited_once()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/unit/templates/test_safety.py` around lines 319 - 328, The test
test_shellcheck_timeout_graceful should also assert that the subprocess cleanup
was invoked: after patching asyncio.create_subprocess_exec to return
mock_process and calling validate_rendered_output, add assertions that
mock_process.kill() was called (e.g., mock_process.kill.assert_called_once())
and that mock_process.wait() was awaited (e.g., awaitable assertion like
mock_process.wait.assert_awaited() or mock_process.wait.assert_awaited_once());
this verifies the timeout path triggers the documented zombie-prevention
cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@backend/app/services/repair_service.py`:
- Around line 171-217: Tests currently call the async function render_repair
without awaiting it, returning a coroutine; update each affected test (in the
repair service unit tests) to be async def and await service.render_repair(...),
and add the appropriate pytest async support (e.g., apply `@pytest.mark.asyncio`
or ensure pytest-asyncio is enabled) so assertions receive the actual result
instead of a coroutine.

---

Nitpick comments:
In `@backend/tests/unit/templates/test_safety.py`:
- Around line 319-328: The test test_shellcheck_timeout_graceful should also
assert that the subprocess cleanup was invoked: after patching
asyncio.create_subprocess_exec to return mock_process and calling
validate_rendered_output, add assertions that mock_process.kill() was called
(e.g., mock_process.kill.assert_called_once()) and that mock_process.wait() was
awaited (e.g., awaitable assertion like mock_process.wait.assert_awaited() or
mock_process.wait.assert_awaited_once()); this verifies the timeout path
triggers the documented zombie-prevention cleanup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: beea1608-1e9e-46f8-b189-5c9038fd689f

📥 Commits

Reviewing files that changed from the base of the PR and between 80ecef9 and 5fd9291.

⛔ Files ignored due to path filters (1)
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (7)
  • backend/app/ai/service.py
  • backend/app/api/v1/diagnose.py
  • backend/app/services/repair_service.py
  • backend/app/templates/engine.py
  • backend/app/templates/safety.py
  • backend/tests/integration/test_ai_pipeline.py
  • backend/tests/unit/templates/test_safety.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Shellcheck Subprocess Execution Blocks FastAPI Event Loop

1 participant