fix(ci): prevent CUDA hang and add test timeouts by royerloic · Pull Request #76 · royerlab/napari-chatgpt

royerloic · 2026-02-13T06:10:15Z

Summary

Set CUDA_VISIBLE_DEVICES="" in CI env to prevent TensorFlow/PyTorch from trying to initialize CUDA on CPU-only runners (CUDA pip packages are present but no GPU driver, causing an indefinite hang during test collection)
Add timeout-minutes: 30 to the CI job to prevent 6-hour runs (GitHub Actions default)
Add --timeout=300 (5 min per test) via pytest-timeout to kill individual hanging tests
Add pytest-timeout to testing dependencies
Mark 3 network-dependent tests as @pytest.mark.integration so they are excluded from CI: test_web_search_tool, test_wikipedia_search_tool, test_add_image_cell

Root cause

All recent CI runs were "cancelled" after ~6 hours. The test command pytest src/ -v -k "not integration and not llm and not slow" collects tests including cellpose_test.py and stardist_test.py, whose @pytest.mark.skipif(not check_*_installed(), ...) decorators import cellpose (→ PyTorch → CUDA) and stardist (→ csbdeep → TensorFlow → CUDA) at collection time. On GitHub Actions runners, the CUDA pip packages (nvidia-cuda-nvcc-cu12, etc.) are installed but no GPU driver exists, causing TensorFlow's CUDA initialization to hang indefinitely.

Test plan

All 233 tests pass locally with CUDA_VISIBLE_DEVICES="" and --timeout=300 in 3m28s
CI completes within 30 minutes after merge

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Marked several tests as integration to improve organization.
- Added CPU-only runtime controls and stricter job/test timeouts to reduce CI hangs.
Chores
- Added pytest-timeout to test dependencies.
Bug Fixes / Reliability
- Avoid heavy imports when checking optional imaging packages to reduce CI/workstation overhead.
- Make segmentation routines tolerant across scikit-image versions to prevent failures.
- Skip interactive API-key dialogs in non-interactive environments and require an API key before LLM initialization.

TensorFlow/PyTorch CUDA initialization hangs indefinitely on GitHub Actions CPU-only runners because CUDA pip packages are present but no GPU driver exists. Set CUDA_VISIBLE_DEVICES="" to disable GPU detection, add 30-minute job timeout, and 5-minute per-test timeout via pytest-timeout. Also mark network-dependent tests as integration so they are excluded from CI runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-13T06:10:43Z

Walkthrough

Adds CI job-level timeout and CPU-only env vars, introduces pytest timeout in CI and test deps, marks several tests as integration tests, replaces direct-import checks with importlib.util.find_spec for package detection, and adds scikit-image-version-aware calls to remove_small_objects.

Changes

Cohort / File(s)	Summary
CI Workflow `\.github/workflows/test.yml`	Adds 30-minute job timeout; sets CPU-only env vars (`CUDA_VISIBLE_DEVICES="-1"`, `TORCH_CUDA_ARCH_LIST=""`, `USE_CUDA="0"`); preserves existing test command but adds `--timeout=300` to pytest and keeps verbose/test selection.
Test Dependencies `pyproject.toml`	Adds `pytest-timeout` to optional test deps and to `tool.hatch.envs.hatch-test` extra-dependencies (reorders test-related entries).
Integration Test Markers `src/napari_chatgpt/omega_agent/tools/tests/web_search_tool_tests.py`, `src/napari_chatgpt/omega_agent/tools/tests/wikipedia_search_tool_tests.py`, `src/napari_chatgpt/utils/notebook/test/jupyter_notebook_test.py`	Adds `@pytest.mark.integration` to three tests and imports `pytest` where needed; no logic changes.
Package-detection refactor & tests `src/napari_chatgpt/omega_agent/tools/napari/delegated_code/utils.py`, `src/napari_chatgpt/omega_agent/tools/napari/delegated_code/test/utils_test.py`	Replace try/except import checks with `importlib.util.find_spec(...)` to detect `stardist`/`cellpose` without importing; update tests and module docstring accordingly.
API key / interactive guard `src/napari_chatgpt/llm/api_keys/api_key.py`	Add private `_is_interactive()` helper; skip GUI dialog in non-interactive/CI environments and adjust messaging/control flow in `set_api_key()`.
LLM availability pre-check `src/napari_chatgpt/llm/litemind_api.py`	`is_llm_available` now requires at least one API key (OpenAI/Anthropic/Gemini) before proceeding; adds local import/check for keys to avoid unnecessary initialization.
scikit-image compatibility (core code & tests) `src/napari_chatgpt/omega_agent/tools/napari/delegated_code/classic.py`, `src/napari_chatgpt/utils/napari/test/napari_viewer_info_test.py`, `src/napari_chatgpt/utils/segmentation/remove_small_segments.py`	Introduce version-aware calls to `remove_small_objects`: try `max_size=...` (newer scikit-image) and fall back to `min_size=...` on TypeError to preserve behavior across scikit-image versions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled CI carrots, calm and spry,

Timeouts trimmed the thicket, tests hop by,
CUDA tucked in for a quiet nap,
mark.integration gave each test a map,
find_spec peeks — imports take a nap. 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately summarizes the main changes: preventing CUDA hangs and adding test timeouts to the CI workflow, which are the primary objectives.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/ci-timeout-cuda-hang

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Replace `import stardist` / `import cellpose` in check_*_installed() with `importlib.util.find_spec()` which detects the package without loading it. This prevents TensorFlow/PyTorch from being imported at pytest collection time, which hangs on CI runners without GPU drivers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add import verification and collect-only diagnostic steps with their own timeouts to pinpoint where the hang occurs. Use CUDA_VISIBLE_DEVICES=-1 (explicit no-GPU) instead of empty string, plus TORCH_CUDA_ARCH_LIST and USE_CUDA=0 to further prevent CUDA initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/test.yml:
- Around line 64-66: The workflow step "Collect tests (diagnose hangs)"
currently pipes pytest into tail which masks pytest's exit code; update the
step's run command to enable pipefail (e.g., run under bash with set -o pipefail
or use bash -lc "set -o pipefail && ...") so that the exit code from pytest (the
command that may fail during collection) is propagated instead of being
swallowed by tail; retain the diagnostic behavior (xvfb-run, pytest args, and
tail -20) but ensure pipefail is set so failures in pytest cause the step to
fail.

coderabbitai · 2026-02-13T07:34:09Z

+      - name: Collect tests (diagnose hangs)
+        timeout-minutes: 5
+        run: xvfb-run pytest src/ --co --timeout=120 -k "not integration and not llm and not slow" 2>&1 | tail -20


⚠️ Potential issue | 🟡 Minor

Piping through tail swallows the pytest exit code.

If pytest --co fails (e.g., an import error during collection), the step will still pass because tail -20 exits 0. Since this is a diagnostic step meant to surface hangs, a silent failure here would be misleading.

Consider using pipefail to propagate the exit code:

Proposed fix

- run: xvfb-run pytest src/ --co --timeout=120 -k "not integration and not llm and not slow" 2>&1 | tail -20 + run: | + set -o pipefail + xvfb-run pytest src/ --co --timeout=120 -k "not integration and not llm and not slow" 2>&1 | tail -20

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Collect tests (diagnose hangs)

timeout-minutes: 5

run: xvfb-run pytest src/ --co --timeout=120 -k "not integration and not llm and not slow" 2>&1 | tail -20

- name: Collect tests (diagnose hangs)

timeout-minutes: 5

run: |

set -o pipefail

xvfb-run pytest src/ --co --timeout=120 -k "not integration and not llm and not slow" 2>&1 | tail -20

🤖 Prompt for AI Agents

In @.github/workflows/test.yml around lines 64 - 66, The workflow step "Collect tests (diagnose hangs)" currently pipes pytest into tail which masks pytest's exit code; update the step's run command to enable pipefail (e.g., run under bash with set -o pipefail or use bash -lc "set -o pipefail && ...") so that the exit code from pytest (the command that may fail during collection) is propagated instead of being swallowed by tail; retain the diagnostic behavior (xvfb-run, pytest args, and tail -20) but ensure pipefail is set so failures in pytest cause the step to fail.

Root cause: `is_llm_available()` (used in 10+ `@pytest.mark.skipif` decorators) called `get_litemind_api()` → `set_api_key()` which showed a blocking Qt dialog when API keys were missing. On CI this dialog hanged indefinitely, preventing test collection. - Add `_is_interactive()` guard to `set_api_key()` that detects pytest and CI environments and skips the dialog - Add early API key check in `is_llm_available()` to avoid expensive initialisation when no keys are set - Remove diagnostic CI steps no longer needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

scikit-image >= 0.24 added `max_size` parameter, and >= 0.26 deprecated `min_size`. Use try/except to support both old and new versions (Python 3.10 gets older scikit-image, Python 3.12 gets newer). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.github/workflows/test.yml:
- Around line 22-29: Replace the non-standard CUDA_VISIBLE_DEVICES value "-1"
with the CUDA-documented standard empty string "" in the GitHub Actions
environment block; update the env entry for CUDA_VISIBLE_DEVICES so it is set to
an empty string (CUDA_VISIBLE_DEVICES: "") to follow CUDA conventions and avoid
potential device-detection confusion (leave TORCH_CUDA_ARCH_LIST and USE_CUDA
unchanged).

In `@src/napari_chatgpt/utils/napari/test/napari_viewer_info_test.py`:
- Around line 33-38: The test currently mismatches scikit-image versions and
uses opposite semantics (max_size vs min_size); update the code to explicitly
detect the skimage version (skimage.__version__) and ensure consistent "remove
objects smaller than N" behavior: for versions that support min_size call
remove_small_objects(clear_border(bw), min_size=20), and for versions where
min_size was removed (>=0.26) emulate the same semantics by labeling
(skimage.measure.label) and removing small components via np.bincount to filter
out labels smaller than 20 so the test result is identical across versions;
reference remove_small_objects, clear_border, label and the bw test image when
making the change.

🧹 Nitpick comments (2)

src/napari_chatgpt/omega_agent/tools/napari/delegated_code/classic.py (1)
174-183: Third copy of the version-compat block — use remove_small_segments instead.

This file already has the min_segment_size parameter. The utility function remove_small_segments() in src/napari_chatgpt/utils/segmentation/remove_small_segments.py encapsulates this exact logic. Replacing this block with a call to it would eliminate the duplication.
Suggested refactor
+    from napari_chatgpt.utils.segmentation.remove_small_segments import remove_small_segments
+
     # Remove small objects:
-    try:
-        # scikit-image >= 0.24 (min_size deprecated in 0.26)
-        labels = remove_small_objects(
-            labels, max_size=min_segment_size - 1
-        )
-    except TypeError:
-        # scikit-image < 0.24
-        labels = remove_small_objects(
-            labels, min_size=min_segment_size
-        )
+    labels = remove_small_segments(labels, min_segment_size)
You can also remove remove_small_objects from the imports at the top of the file if no other call site uses it.
src/napari_chatgpt/llm/litemind_api.py (1)

56-63: Good early-exit to avoid expensive initialization and potential Qt dialog.

Minor note: the provider list ["OpenAI", "Anthropic", "Gemini"] is duplicated from api_key.py's __api_key_names dict. If providers are added later, both locations need updating. Consider exposing the key names from api_key.py (e.g., a public get_supported_providers() function) to keep them in sync.

coderabbitai · 2026-02-13T14:54:58Z

+    env:
+      # Prevent TensorFlow/PyTorch from trying to initialise CUDA on
+      # CPU-only runners (CUDA pip packages are present but no GPU
+      # driver exists, which causes an indefinite hang).
+      CUDA_VISIBLE_DEVICES: "-1"
+      # Force CPU-only mode for PyTorch and TensorFlow
+      TORCH_CUDA_ARCH_LIST: ""
+      USE_CUDA: "0"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/test.yml | sed -n '20,35p'

Repository: royerlab/napari-chatgpt

Length of output: 703

🌐 Web query:

CUDA_VISIBLE_DEVICES empty string vs -1 behavior TensorFlow PyTorch

💡 Result:

CUDA semantics (applies to both TensorFlow and PyTorch)

CUDA_VISIBLE_DEVICES="" (empty string) ⇒ no GPUs are visible to CUDA. [1]

CUDA_VISIBLE_DEVICES="-1" ⇒ -1 is an invalid device index; CUDA’s rule is “stop at the first invalid index,” so with -1 as the first entry, no GPUs are visible (effectively the same outcome as empty string). [1]

Why people observe different behavior in TensorFlow / PyTorch

In practice, “"" didn’t work but -1 did” is usually because the env var was set too late (after the framework or a dependency already initialized CUDA / queried devices):

PyTorch: set CUDA_VISIBLE_DEVICES before importing torch (or set it in the shell). PyTorch maintainers explicitly warn that setting it after imports may have no effect. [3]

TensorFlow: device visibility must be configured before GPUs are initialized (TF notes this for tf.config.set_visible_devices; the same timing principle applies to CUDA_VISIBLE_DEVICES). [2]

Practical recommendations

To disable GPUs reliably, prefer setting it in the shell:

CUDA_VISIBLE_DEVICES="" python your_script.py (hide all GPUs) [1]

CUDA_VISIBLE_DEVICES="-1" python your_script.py (also hides all GPUs via “invalid index first”) [1]

If setting inside Python, do it at the very top, before importing TensorFlow/PyTorch (and ideally before any library that might touch CUDA). [2][3]

Related gotcha: reindexing

If you set CUDA_VISIBLE_DEVICES=2, the one visible physical GPU is renumbered as cuda:0 / /GPU:0 inside the process. [1]

Sources: NVIDIA CUDA Programming Guide on CUDA_VISIBLE_DEVICES [1]; TensorFlow GPU guide (initialization timing) [2]; PyTorch forum guidance on setting env vars before imports [3].

🌐 Web query:

CUDA_VISIBLE_DEVICES -1 invalid ordinal warnings

💡 Result:

CUDA_VISIBLE_DEVICES=-1 makes CUDA see no valid GPUs, because -1 is an invalid GPU index; when CUDA hits an invalid index in the list, it exposes only devices that appear before it—so with -1 first, that means zero visible devices. Any later attempt by your framework/code to use cuda:0, cuda:1, etc. then triggers “invalid device ordinal” warnings/errors. [1]

What to do instead:

To disable all GPUs (recommended): set an empty string

bash: export CUDA_VISIBLE_DEVICES="" [1]

(this is the CUDA-documented way to hide all GPUs) [1]

If you want GPU N, and only that GPU:

bash: export CUDA_VISIBLE_DEVICES=N

then inside the program, use cuda:0, because CUDA renumbers visible devices starting at 0. [1]

PyTorch-specific gotcha: if you run with CUDA_VISIBLE_DEVICES=1, your script must use cuda:0 (not cuda:1), or you’ll get invalid device ordinal. [2]

Also ensure you set CUDA_VISIBLE_DEVICES before importing GPU-using libraries (PyTorch/TensorFlow), since they may initialize CUDA on import. [2]

Sources:
[1] NVIDIA CUDA Programming Guide — CUDA_VISIBLE_DEVICES behavior and empty-string disabling GPUs
[2] PyTorch Forums — explanation of device renumbering after masking with CUDA_VISIBLE_DEVICES

Change CUDA_VISIBLE_DEVICES to empty string for standard CUDA practice — while "-1" effectively hides all GPUs (by being an invalid device index that CUDA stops at), the CUDA-documented standard approach is to use an empty string (""). Using "" aligns with CUDA conventions and avoids potential confusion or issues if code later attempts to access CUDA devices.

Suggested fix

- CUDA_VISIBLE_DEVICES: "-1" + CUDA_VISIBLE_DEVICES: ""

🤖 Prompt for AI Agents

In @.github/workflows/test.yml around lines 22 - 29, Replace the non-standard CUDA_VISIBLE_DEVICES value "-1" with the CUDA-documented standard empty string "" in the GitHub Actions environment block; update the env entry for CUDA_VISIBLE_DEVICES so it is set to an empty string (CUDA_VISIBLE_DEVICES: "") to follow CUDA conventions and avoid potential device-detection confusion (leave TORCH_CUDA_ARCH_LIST and USE_CUDA unchanged).

coderabbitai · 2026-02-13T14:54:58Z

+    try:
+        # scikit-image >= 0.24 (min_size deprecated in 0.26)
+        cleared = remove_small_objects(clear_border(bw), max_size=19)
+    except TypeError:
+        # scikit-image < 0.24
+        cleared = remove_small_objects(clear_border(bw), min_size=20)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

scikit-image remove_small_objects max_size parameter changelog 0.24

💡 Result:

In scikit-image 0.24.0 (2024-06-18), skimage.morphology.remove_small_objects does not have a max_size parameter—its signature is:

remove_small_objects(ar, min_size=64, connectivity=1, *, out=None) and it “removes objects smaller than min_size”. [1]

The max_size keyword (and the corresponding deprecation of min_size) was introduced later—“Added in version 0.26” (per the current/stable docs), i.e. not in the 0.24 changelog/release notes. [2]

Sources:
[1] scikit-image 0.24.0 API docs (remove_small_objects signature shows min_size) https://scikit-image.org/docs/0.24.x/api/skimage.morphology.html
[2] scikit-image stable API docs (remove_small_objects shows max_size, “Added in version 0.26”, min_size deprecated) https://scikit-image.org/docs/stable/api/skimage.morphology

The comment is misleading about the version timeline, and the code has a semantic issue with parameter usage.

The max_size parameter was introduced in scikit-image 0.26, not 0.24. In versions 0.24 and 0.25, the try block will always raise TypeError and silently fall back to the except clause—the code works only by accident.

Additionally, max_size and min_size have opposite semantics: min_size removes objects smaller than the threshold, while max_size removes objects larger than the threshold. Using max_size=19 versus min_size=20 produces functionally different results, suggesting a logic error. The comment should clarify the version constraint and the semantic difference should be addressed.

🤖 Prompt for AI Agents

In `@src/napari_chatgpt/utils/napari/test/napari_viewer_info_test.py` around lines 33 - 38, The test currently mismatches scikit-image versions and uses opposite semantics (max_size vs min_size); update the code to explicitly detect the skimage version (skimage.__version__) and ensure consistent "remove objects smaller than N" behavior: for versions that support min_size call remove_small_objects(clear_border(bw), min_size=20), and for versions where min_size was removed (>=0.26) emulate the same semantics by labeling (skimage.measure.label) and removing small components via np.bincount to filter out labels smaller than 20 so the test result is identical across versions; reference remove_small_objects, clear_border, label and the bw test image when making the change.

royerloic and others added 2 commits February 12, 2026 22:49

coderabbitai Bot reviewed Feb 13, 2026

View reviewed changes

royerloic and others added 2 commits February 13, 2026 06:33

coderabbitai Bot reviewed Feb 13, 2026

View reviewed changes

royerloic merged commit a14d738 into main Feb 13, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): prevent CUDA hang and add test timeouts#76

fix(ci): prevent CUDA hang and add test timeouts#76
royerloic merged 5 commits into
mainfrom
fix/ci-timeout-cuda-hang

royerloic commented Feb 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 13, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 13, 2026

Uh oh!

coderabbitai Bot Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

royerloic commented Feb 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 13, 2026

Choose a reason for hiding this comment

CUDA semantics (applies to both TensorFlow and PyTorch)

Why people observe different behavior in TensorFlow / PyTorch

Practical recommendations

Related gotcha: reindexing

Uh oh!

coderabbitai Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

royerloic commented Feb 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 13, 2026 •

edited

Loading