diff --git a/.github/codeql/README.md b/.github/codeql/README.md new file mode 100644 index 00000000..4eee57ca --- /dev/null +++ b/.github/codeql/README.md @@ -0,0 +1,159 @@ +# CodeQL configuration + +This directory contains the CodeQL configuration consumed by both +`.github/workflows/codeql.yml` and `.github/workflows/codeql-advanced.yml` +via the `config-file:` key on the `github/codeql-action/init` step. + +## Files + +- `config.yml` -- query filter rules. See inline comments for per-rule + rationale. + +## Query suite vs config-file interaction + +The `queries:` key on `github/codeql-action/init` (set to +`security-and-quality` in `codeql.yml`, default in `codeql-advanced.yml`) +selects which CodeQL query *suite* runs. The `config-file:` value points +at this directory's `config.yml`, which defines `query-filters`. + +Per the [CodeQL Action docs][codeql-config-docs] and the +"queries listed here will override any specified in a config file" +comment block in `.github/workflows/codeql-advanced.yml`: + +- A workflow-level `queries:` value *replaces* a config-file-level + `queries:` value unless the workflow value is prefixed with `+`. +- `query-filters` from the config-file always layer *on top* of whichever + suite is active. + +`config.yml` here intentionally does not define a `queries:` key, so the +workflow-selected suite (`security-and-quality` in `codeql.yml`) remains +the active set; this config only contributes the `query-filters` layer. +If a future contributor adds `queries:` to `config.yml`, the suite will +*replace* the workflow's `security-and-quality` selection unless +explicitly merged with `+queries: ...` in the workflow. + +[codeql-config-docs]: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-your-advanced-setup-for-code-scanning#using-a-custom-configuration-file + +## Suppressed queries + +### `py/unsafe-cyclic-import` (global) + +**Why it fires:** the rule walks `if TYPE_CHECKING:` blocks for cycle +detection. `strands_robots/simulation/{base,policy_runner,benchmark}.py` +form a deliberate static-only cycle so `policy_runner` can call into the +`SimEngine` ABC defined in `base`, while `base` can advertise +`PolicyRunner`-typed return values to static checkers (mypy, IDE +navigation, `typing.get_type_hints` consumers). + +**Why it is safe at runtime:** the cycle is asymmetric, not closed. + +- `simulation/base.py`'s top-level `from + strands_robots.simulation.policy_runner import PolicyRunner, + VideoConfig` is the runtime edge that static analysers see and flag. + (Cited by symbol rather than line number so future edits above the + import don't silently invalidate this rationale.) +- `simulation/policy_runner.py`'s `if TYPE_CHECKING:` block imports + `SimEngine`, `BenchmarkProtocol`, and `Policy`. These are not real + edges at import time -- `TYPE_CHECKING` is `False` at runtime, the + block is skipped, and the names exist only for the static type + system. +- `simulation/benchmark.py`'s `if TYPE_CHECKING:` block similarly + imports `SimEngine`. + +There is therefore no runtime back-edge from `policy_runner` (or +`benchmark`) into `base`. `base` finishes importing cleanly before +anything in `policy_runner` runs at module scope, so the cycle never +closes at import time. `from __future__ import annotations` (PEP 563) +on every affected file is the additional belt-and-braces guarantee +that type-hint resolution itself never re-enters `base` or +`policy_runner` via `typing.get_type_hints` consumers. + +The CodeQL-independent regression contract pins this invariant: + +- `tests/simulation/test_no_cyclic_imports.py` -- spawns a fresh Python + interpreter for each of the three affected modules (and a combined + one-process run) and asserts each imports cleanly with no + `RecursionError` / `ImportError` traceback frames in stderr (anchored + on the start-of-line `ExceptionName:` framing rather than a substring + match, so benign log lines that mention the exception names by name + do not fail the pin). Catches the dynamic-import failure mode where + a top-level statement reintroduces a partial-module re-entry. +- `tests/simulation/test_no_import_cycle.py::test_no_runtime_import_cycles` + -- builds the runtime import graph (excluding `TYPE_CHECKING` and + inside-function edges) via `networkx.simple_cycles` and asserts it is + acyclic. Catches the static-graph failure mode where someone hoists a + `TYPE_CHECKING` import to module scope and re-creates the cycle. +- `tests/simulation/test_no_import_cycle.py::test_base_does_not_lazy_import_policy_runner` + -- counts module-level imports of `policy_runner` from `base.py` and + asserts the count stays at 1 (the documented module-level edge), so a + re-introduced inline lazy import inside a `SimEngine` method (the + prior bug shape) fails loudly. + +If the runtime cycle is ever reintroduced, those pins fail loudly in CI +-- independent of CodeQL's view of the AST. + +**Why a global suppression rather than path-scoped:** CodeQL's +`query-filters` keys filter by `id` / `tags` / `precision` only; +path-scoped exclusion of a single query is not supported (the only +path-aware key is `paths-ignore`, which excludes a file from all +queries -- too broad). The simulation triple is the only file shape in +the repository where this query fires today, so a global exclude is +equivalent in effect *today* and keeps the config small. The cost we +accept is that a future legitimate violation in unrelated code would +be silently suppressed too. Mitigation: the regression pins above +guard against runtime cycles independent of CodeQL, and a future +contributor who introduces a new legitimate violation should drop this +suppression and fix the new cycle properly rather than extend the +filter. + +**Periodic audit reminder:** the assumption that the simulation triple +is the only file shape that fires this rule today drifts silently as +the codebase grows. Maintainers should periodically (every minor +release or every six months, whichever is sooner) enumerate what +`py/unsafe-cyclic-import` would flag absent the filter, e.g. by running +the CodeQL CLI locally with the suppression dropped: + +```bash +# Drop the exclude block in .github/codeql/config.yml temporarily, then: +codeql database create db --language=python --source-root=. +codeql database analyze db \ + codeql/python-queries:Imports/UnsafeCyclicImport.ql \ + --format=sarif-latest --output=cyclic-import.sarif +# Inspect cyclic-import.sarif: every result outside the simulation +# triple is a candidate true-positive that the global exclude is +# silently hiding. +``` + +If results outside the simulation triple appear, drop the suppression +(or narrow it via path-scoped surgery) and fix the new cycle properly. + +**References:** + +- PR #209 -- the multi-round attempt to satisfy CodeQL via code surgery. + Paused at draft after the constraint conflict between mypy and + `py/unsafe-cyclic-import` was confirmed locally; full analysis in + the S13/R6 changelog of that PR. +- Issue #215 -- this follow-up tracking the suppression. +- CodeQL alerts #83, #84 -- closed by PR #209 R4. +- CodeQL alerts #85, #86, #87 -- raised by the rule against the + `if TYPE_CHECKING:` blocks in `benchmark.py` and `policy_runner.py`; + closed by this suppression. (Historical line numbers in the alert + bodies: `benchmark.py:42`, `policy_runner.py:49-50` at the SHA that + raised the alerts.) +- CodeQL alerts #253, #254, #255 -- recurrent alerts on `base.py`'s + `if TYPE_CHECKING:` block introduced by PR #209 R5 attempting to + recover mypy type info; closed by this suppression. (Historical line + number: `base.py:28` at the SHA that raised the alerts.) + +## How to extend + +When adding a new suppression: + +1. Add the `query-filters` entry to `config.yml` with a thorough + inline comment explaining the rule, why it fires here, and why + the suppression is safe. +2. Add a section to this README mirroring the inline comment, plus + the regression contract (pin tests, runtime-safety argument). +3. Ensure the affected source files carry a top-of-file comment + pointing at the config, so future readers find the rationale + without grep-spelunking. diff --git a/.github/codeql/config.yml b/.github/codeql/config.yml new file mode 100644 index 00000000..f534aa76 --- /dev/null +++ b/.github/codeql/config.yml @@ -0,0 +1,53 @@ +# CodeQL configuration for strands-robots. +# +# Wired into both .github/workflows/codeql.yml (CodeQL) and +# .github/workflows/codeql-advanced.yml (CodeQL Advanced) via the +# `config-file:` key on the `github/codeql-action/init` step. +# +# See .github/codeql/README.md for the full rationale per query exclusion. + +name: "strands-robots CodeQL config" + +# Inherit the default query suite selected per workflow (security-and-quality +# in codeql.yml; default in codeql-advanced.yml). We layer query-filters on +# top to suppress documented false-positives. +query-filters: + # py/unsafe-cyclic-import: + # + # Globally suppressed. The rule walks `if TYPE_CHECKING:` blocks for cycle + # detection, which makes it fire on the deliberate static-only cycle in + # strands_robots/simulation/{base,policy_runner,benchmark}.py. + # + # Runtime is safe because the cycle is asymmetric, not closed: + # base.py's top-level ``from .policy_runner import PolicyRunner, + # VideoConfig`` statement is the only runtime edge. policy_runner.py + # imports SimEngine / BenchmarkProtocol / Policy from base only inside + # its ``if TYPE_CHECKING:`` block -- no runtime back-edge closes the + # cycle at import time. ``from __future__ import annotations`` (PEP + # 563) is additional belt-and-braces so type-hint resolution at runtime + # never re-enters either module. + # + # This is the only file shape in the repository where the rule fires + # today; a global suppression keeps the config small and avoids + # path-scoped filter syntax (CodeQL `query-filters` only filters by + # id / tags / precision, not by path). The cost we accept is that a + # future legitimate violation in unrelated code is silently suppressed + # too. Mitigation: the regression pins below guard runtime-safety + # invariants independent of CodeQL, and a future contributor who hits + # a new legitimate violation should drop this suppression and fix the + # new cycle properly rather than extend the filter. + # + # Regression contract: tests/simulation/test_no_cyclic_imports.py and + # tests/simulation/test_no_import_cycle.py pin the runtime-safety + # invariants. The first spawns fresh interpreters and asserts each + # affected module imports cleanly with no RecursionError / ImportError; + # the second builds the runtime import graph (excluding TYPE_CHECKING + # and inside-function edges) and asserts it is acyclic. Both fail + # loudly if the runtime cycle is ever reintroduced, independent of + # CodeQL's view of the AST. + # + # Refs: PR #209 (paused, full constraint analysis in S13/R6); issue #215 + # (this suppression); CodeQL alerts #83-#87, #253-#255 (the recurrent + # alerts the suppression closes). + - exclude: + id: py/unsafe-cyclic-import diff --git a/.github/workflows/codeql-advanced.yml b/.github/workflows/codeql-advanced.yml index 4ab6bbad..535afb54 100644 --- a/.github/workflows/codeql-advanced.yml +++ b/.github/workflows/codeql-advanced.yml @@ -71,6 +71,9 @@ jobs: with: languages: ${{ matrix.language }} build-mode: ${{ matrix.build-mode }} + # Repo-level query filters (suppress documented false-positives). + # See .github/codeql/README.md for per-rule rationale. + config-file: ./.github/codeql/config.yml # If you wish to specify custom queries, you can do so here or in a config file. # By default, queries listed here will override any specified in a config file. # Prefix the list here with "+" to use these queries and those in the config file. diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml index d466389e..dc60877e 100644 --- a/.github/workflows/codeql.yml +++ b/.github/workflows/codeql.yml @@ -33,10 +33,13 @@ jobs: # surfaced in PR #85 (MJCF interpolation) and PR #90 (subprocess # injection / path traversal). queries: security-and-quality + # Repo-level filters layered on top of the selected suite. See + # .github/codeql/README.md for per-rule rationale. + config-file: ./.github/codeql/config.yml - name: Perform CodeQL Analysis uses: github/codeql-action/analyze@4e828ff8d448a8a6e532957b1811f387a63867e8 # v3.29.4 with: category: "/language:python" # Findings appear in the Security tab; PRs are not blocked on - # CodeQL alerts (false-positive rate makes hard-fail noisy). \ No newline at end of file + # CodeQL alerts (false-positive rate makes hard-fail noisy). diff --git a/strands_robots/simulation/base.py b/strands_robots/simulation/base.py index 58892756..615df7a4 100644 --- a/strands_robots/simulation/base.py +++ b/strands_robots/simulation/base.py @@ -26,11 +26,34 @@ if TYPE_CHECKING: from strands_robots.policies import Policy -# PolicyRunner and VideoConfig are used by run_policy / replay / eval_policy. -# We could defer these with inline lazy imports (and historically did), but -# policy_runner.py only imports `SimEngine` from base under TYPE_CHECKING so -# the runtime cycle doesn't actually exist. Keep the imports at module level -# to break the AST-visible cycle that static analysers flag. +# CodeQL: the AST cycle base <-> policy_runner is a documented false-positive +# suppressed via .github/codeql/config.yml (py/unsafe-cyclic-import). +# See .github/codeql/README.md for the full rationale. +# +# Why this is safe at runtime: the cycle is asymmetric, not closed. +# This module imports ``PolicyRunner`` / ``VideoConfig`` at module level +# (the runtime edge), but ``policy_runner.py`` imports ``SimEngine`` from +# ``base`` only under ``TYPE_CHECKING``. There is no runtime back-edge +# from ``policy_runner`` into this module, so the cycle never closes at +# import time. Do NOT hoist the ``TYPE_CHECKING`` import in +# ``policy_runner.py`` to module scope -- that would close the loop and +# break ``import strands_robots.simulation.policy_runner`` with +# ``ImportError: cannot import name 'SimEngine' from partially initialized +# module ...``. ``from __future__ import annotations`` (PEP 563) is +# additional belt-and-braces, but the asymmetric edge is what makes the +# import work; PEP 563 only governs how type hints are resolved later. +# +# Pin tests guard this invariant independent of CodeQL: +# - tests/simulation/test_no_cyclic_imports.py: spawns fresh interpreters +# per affected module and asserts each imports cleanly. +# - tests/simulation/test_no_import_cycle.py: AST-builds the runtime +# import graph (excluding TYPE_CHECKING / inside-function edges) and +# asserts it is acyclic; also pins that this module keeps exactly one +# module-level import of policy_runner (re-introducing inline lazy +# imports inside SimEngine methods was the prior bug shape). +# +# PolicyRunner and VideoConfig are used by run_policy / replay / +# eval_policy on this class. # # Note (#191): we deliberately do NOT import ``OnFrame`` here, even under # ``TYPE_CHECKING`` — CodeQL's ``py/unsafe-cyclic-import`` rule walks diff --git a/strands_robots/simulation/benchmark.py b/strands_robots/simulation/benchmark.py index 4abe25e2..3b05c75a 100644 --- a/strands_robots/simulation/benchmark.py +++ b/strands_robots/simulation/benchmark.py @@ -38,6 +38,10 @@ from dataclasses import dataclass, field from typing import TYPE_CHECKING, Any +# CodeQL: the AST cycle benchmark <-> base is suppressed via +# .github/codeql/config.yml (py/unsafe-cyclic-import). The edge +# exists only under TYPE_CHECKING; runtime is safe under PEP 563. +# See .github/codeql/README.md. if TYPE_CHECKING: from strands_robots.simulation.base import SimEngine diff --git a/strands_robots/simulation/policy_runner.py b/strands_robots/simulation/policy_runner.py index 8b1cd0cf..d1c36bb6 100644 --- a/strands_robots/simulation/policy_runner.py +++ b/strands_robots/simulation/policy_runner.py @@ -44,6 +44,10 @@ from strands_robots._async_utils import _resolve_coroutine from strands_robots.utils import require_optional +# CodeQL: the AST cycle policy_runner <-> base / benchmark is suppressed +# via .github/codeql/config.yml (py/unsafe-cyclic-import). These edges +# exist only under TYPE_CHECKING; runtime is safe under PEP 563. +# See .github/codeql/README.md. if TYPE_CHECKING: from strands_robots.policies.base import Policy from strands_robots.simulation.base import SimEngine diff --git a/tests/simulation/test_no_cyclic_imports.py b/tests/simulation/test_no_cyclic_imports.py new file mode 100644 index 00000000..9cf73a95 --- /dev/null +++ b/tests/simulation/test_no_cyclic_imports.py @@ -0,0 +1,162 @@ +"""Regression: simulation triple imports cleanly in fresh interpreters. + +Complement to ``tests/simulation/test_no_import_cycle.py``. Where that test +asserts the *static* runtime-import graph (parsed from source) is acyclic, +this test asserts the *dynamic* import actually succeeds in a clean Python +process for each affected module. + +The two pins catch different failure modes: + +- A static-graph regression (e.g. someone hoists ``from .. import SimEngine`` + out of a ``TYPE_CHECKING`` block in ``policy_runner.py``) is caught by + ``test_no_import_cycle.py``: the AST scan would re-add the ``policy_runner + -> base`` edge and the cycle would resurface. +- A dynamic-import regression (e.g. a top-level statement in one of the + modules raises during import, or the import order produces a partial-module + ``ImportError`` only at process start) is caught here: the static graph + could remain acyclic while the *actual* import still fails. A subprocess + per module isolates each import from cached state in the test runner. + +Together they form the CodeQL-independent regression contract referenced +from ``.github/codeql/config.yml`` and ``.github/codeql/README.md``. + +Pinned modules (the documented static-only cycle suppressed by +``py/unsafe-cyclic-import``): + +- ``strands_robots.simulation.base`` +- ``strands_robots.simulation.policy_runner`` +- ``strands_robots.simulation.benchmark`` +""" + +from __future__ import annotations + +import os +import re +import subprocess +import sys + +import pytest + +# The simulation triple that participates in the documented static-only +# cycle. If a future suppression covers a new file shape, add it here so +# the regression contract grows with the suppression list. +_SIMULATION_TRIPLE: tuple[str, ...] = ( + "strands_robots.simulation.base", + "strands_robots.simulation.policy_runner", + "strands_robots.simulation.benchmark", +) + +# Traceback-shape regex for the regression assertions below. We anchor on +# the start-of-line ``ExceptionName:`` framing that Python emits for an +# uncaught traceback (and for chained ``raise from`` re-raises) rather than +# substring-matching the bare exception name. Substring matching is too +# loose: any benign log line, deprecation warning, or docstring snippet +# that happens to mention ``ImportError`` / ``RecursionError`` (for +# example, an optional-dep warning saying "...will raise ImportError on +# Python 3.13...") would otherwise fail the pin even though the import +# itself succeeded. The traceback frame is the only stderr shape that +# *actually* indicates the failure mode this pin exists to catch. +_TRACEBACK_EXC_PATTERN = re.compile(r"^(ImportError|RecursionError):", re.MULTILINE) + + +def _subprocess_env() -> dict[str, str]: + """Env that keeps the parent ``sys.path`` visible to the child. + + A bare ``python -c "import ..."`` in CI relies on the parent process's + ``sys.path`` (set by the editable install or the wheel install on the + runner). Rather than re-deriving that path, propagate it via + ``PYTHONPATH``: same module-resolution behaviour as the parent, but a + fresh ``sys.modules`` cache so the per-process import-time dynamics + are exercised honestly. + + Empty entries in ``sys.path`` and an empty inherited ``PYTHONPATH`` are + filtered out before joining. POSIX interprets a leading, trailing, or + interior empty pathsep entry as the *current working directory*, which + on a CI runner is the checkout root: that would silently inject ``.`` + into the child's ``sys.path`` and could mask a regression where a + module is only importable because ``cwd`` happens to contain it. + """ + env = os.environ.copy() + # Build the explicit path list: real entries from sys.path first, then + # the parent's PYTHONPATH if (and only if) it is set and non-empty. + # ``-I`` would strip these and produce a false positive. + parts = [p for p in sys.path if p] + inherited = env.get("PYTHONPATH", "") + if inherited: + # Split on pathsep and filter empty entries individually rather + # than appending the inherited string verbatim. POSIX interprets + # any empty pathsep entry (leading ``:foo``, trailing ``foo:``, + # interior ``foo::bar``) as the *current working directory*, so + # appending an inherited PYTHONPATH that already contains an + # internal empty would silently inject ``cwd`` into the child's + # ``sys.path`` even though we filter our own ``sys.path`` empties + # above. + parts.extend(p for p in inherited.split(os.pathsep) if p) + env["PYTHONPATH"] = os.pathsep.join(parts) + return env + + +@pytest.mark.parametrize("module", _SIMULATION_TRIPLE) +def test_module_imports_in_fresh_interpreter(module: str) -> None: + """Each module imports cleanly in a brand-new Python process. + + Spawns ``python -c "import "`` so the import is not contaminated + by the test runner's already-cached ``sys.modules``. Asserts exit code 0 + and no ``RecursionError`` / ``ImportError`` traceback frames in stderr; + on failure surfaces both for diagnosis. + + The 30-second timeout guards against an infinite-recursion regression + (the precise failure mode this pin exists to catch) hanging the suite. + """ + result = subprocess.run( + [sys.executable, "-c", f"import {module}"], + capture_output=True, + text=True, + timeout=30, + env=_subprocess_env(), + ) + assert result.returncode == 0, ( + f"fresh-interpreter import of {module} failed (exit {result.returncode}).\n" + f"stdout:\n{result.stdout}\n" + f"stderr:\n{result.stderr}" + ) + # Surface any import-time recursion or partial-failure traces as well. + # An import that "succeeds" (exit 0) but emits a traceback on stderr + # from a swallowed inner frame is still a regression. The regex anchors + # on the start-of-line ``ExceptionName:`` framing so benign log lines + # that happen to mention these exception names by name do not fail + # the pin. + match = _TRACEBACK_EXC_PATTERN.search(result.stderr) + assert match is None, ( + f"{match.group(1)} traceback surfaced during fresh-interpreter import of {module}:\n{result.stderr}" + ) + + +def test_full_triple_imports_in_one_process() -> None: + """Importing all three modules in a single process also succeeds. + + Catches the failure mode where each module imports cleanly in isolation + (the per-module test above passes) but the *combination* hits a + partial-module state -- e.g. ``base`` is mid-import, ``policy_runner`` + is imported as a side-effect, and ``policy_runner`` then re-enters + ``base`` before its module dict is fully populated. + """ + imports = "; ".join(f"import {m}" for m in _SIMULATION_TRIPLE) + result = subprocess.run( + [sys.executable, "-c", imports], + capture_output=True, + text=True, + timeout=30, + env=_subprocess_env(), + ) + assert result.returncode == 0, ( + f"combined fresh-interpreter import of the simulation triple failed " + f"(exit {result.returncode}).\n" + f"stdout:\n{result.stdout}\n" + f"stderr:\n{result.stderr}" + ) + match = _TRACEBACK_EXC_PATTERN.search(result.stderr) + assert match is None, ( + f"{match.group(1)} traceback surfaced during combined fresh-interpreter " + f"import of the simulation triple:\n{result.stderr}" + )