Skip to content

Refactor/state management and robustness#3

Open
yokoszn wants to merge 3 commits into
mainfrom
refactor/state-management-and-robustness
Open

Refactor/state management and robustness#3
yokoszn wants to merge 3 commits into
mainfrom
refactor/state-management-and-robustness

Conversation

@yokoszn

@yokoszn yokoszn commented Nov 23, 2025

Copy link
Copy Markdown

Summary by CodeRabbit

  • New Features

    • Added scope-based configuration system for managing targets, networks, and exclusions via YAML/JSON files
    • Implemented circuit breaker pattern for LLM API resilience with automatic recovery
    • Added response caching and streaming support for LLM requests
    • Introduced role-based tool access control for agents
    • Added real-time event tracing and telemetry for scan visibility
    • Implemented progress persistence utilities for tracking scan state
  • Documentation

    • Added comprehensive prompting guide and scope configuration reference
    • Created scope templates and configuration examples
  • Improvements

    • Enhanced agent status management with better state transitions
    • Improved error handling and retry logic for LLM requests
    • Added development Docker setup with docker-compose

✏️ Tip: You can customize this high-level summary in your review settings.

yokoszn and others added 3 commits November 23, 2025 04:31
Scope Configuration System:
- Add strix/scope/ module with Pydantic models for scope configuration
- Support YAML/JSON scope files with networks, targets, exclusions, domains
- Add --scope, --filter, --validate CLI arguments
- Inject scope_context and exclusion_rules into agent task descriptions
- Add scope awareness section to root_agent.jinja prompt

Role-Based Tool Enforcement:
- Add TOOL_PROFILES with 6 roles: root, recon, testing, validation, reporting, fixing
- Implement is_tool_allowed_for_role() in registry.py
- Add runtime enforcement in executor.py validate_tool_availability()
- Track agent_role in AgentState and LLMConfig

Additional Features:
- Add Proxmox VE prompt module (proxmox_ve.jinja)
- Add documentation: SCOPE_CONFIGURATION.md, PROMPTING_GUIDE.md
- Add scope templates: scope.yaml, scope.json, proxmox-cluster.yaml
- Add development configs: Dockerfile.dev, docker-compose.dev.yml, .env.example
- Add cache-implementation-project.md planning document

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
State Management:
- Add AgentStatus enum replacing boolean flags (completed, stop_requested,
  waiting_for_input, llm_failed) for single source of truth
- Add always-on timeout (max_wait_seconds=300) to prevent indefinite stalls
- Add consecutive_empty_responses tracking with MAX_EMPTY_RESPONSES=3 limit
- Add failure_reason field for explicit failure tracking
- Add backward-compatible properties for existing code

LLM Improvements:
- Add retry with exponential backoff for transient errors (rate limits,
  timeouts, service unavailable)
- Add RETRYABLE_ERRORS tuple and MAX_RETRIES=3 configuration
- Add real-time event emission for LLM requests/responses

Code Quality Fixes:
- Add thread locks (_graph_lock) to global mutable state in agents_graph_actions
- Fix ProxyManager singleton caching bug (was creating new instance each call)
- Add queue timeouts (QUEUE_TIMEOUT=120) in tool_server worker
- Convert request_queue from threading to asyncio primitives
- Add container cleanup on failure in docker_runtime
- Re-enable file-based logging in tool_server worker processes

New Features:
- Add real-time event streaming with verbose mode (-v/--verbose flag)
- Add --debug flag for full LiteLLM debug output
- Add progress tracking tools (save_progress, load_progress, list_progress)
- Add crash-proof persistence via events.jsonl

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Performance:
- Tool parallelization: read-only tools now execute concurrently
- Streaming LLM responses with early termination on </function>
- LLM response caching with configurable TTL and LRU eviction

Stability:
- Circuit breaker pattern for LLM API (prevents cascading failures)
- Worker health checks with automatic restart on death
- Graceful degradation for tool parsing failures
- Individual error isolation in parallel tool execution
- Response queue timeout to prevent indefinite blocking

Bug fixes:
- Fix asyncio Semaphore event loop binding issue (revert to threading)

New environment variables:
- LLM_STREAMING: enable/disable streaming (default: true)
- LLM_CACHE_ENABLED: enable response caching (default: true)
- LLM_CACHE_MAX_SIZE: max cached responses (default: 100)
- LLM_CACHE_TTL: cache TTL in seconds (default: 3600)
- LLM_CIRCUIT_FAILURE_THRESHOLD: failures before circuit opens (default: 5)
- LLM_CIRCUIT_RECOVERY_TIMEOUT: seconds before retry (default: 60)

Docs:
- Add scope configuration section to README

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Nov 23, 2025

Copy link
Copy Markdown

Walkthrough

This pull request introduces a comprehensive feature suite for Strix: a scope-based engagement configuration system, multi-layered LLM resilience (circuit breaker, response caching, streaming), agent state refactoring with status-driven control flow, real-time event tracing, role-based tool access control, and enhanced Docker/runtime support alongside extensive documentation and templates.

Changes

Cohort / File(s) Summary
Environment & Docker Setup
.env.example, Dockerfile.dev, docker-compose.dev.yml
Added development environment configuration template, Python 3.12 development image, and Docker Compose configuration for local testing with Poetry dependency management and volume mounts.
Documentation
README.md, docs/PROMPTING_GUIDE.md, docs/SCOPE_CONFIGURATION.md, cache-implementation-project.md
Added comprehensive guides for prompting architecture, scope configuration usage and CLI integration, and a phased cache implementation proposal for multi-agent coordination. Updated README with scope workflow examples.
Agent State Management Refactor
strix/agents/state.py, strix/agents/__init__.py
Introduced AgentStatus enum (RUNNING, WAITING_FOR_MESSAGE, WAITING_FOR_RECOVERY, COMPLETED, FAILED, STOPPED) and refactored AgentState to replace multiple boolean flags with single status field; added agent_role, failure_reason, consecutive_empty_responses tracking; updated state transition methods and preserved backward-compatible properties.
Agent Control Flow
strix/agents/base_agent.py
Reworked agent_loop with status-driven dispatch, new private helpers (_finalize_agent, _check_iteration_warnings, _handle_llm_error), empty-response tracking with MAX_EMPTY_RESPONSES constant, and tracer-logged state transitions.
Root Agent Initialization & Scope Context
strix/agents/StrixAgent/strix_agent.py
Added agent_role initialization and scope_context/exclusion_rules injection from scan_config into task_description as structured blocks.
LLM Circuit Breaker & Resilience
strix/llm/circuit_breaker.py, strix/llm/__init__.py
Introduced thread-safe CircuitBreaker class with configurable thresholds, HALF_OPEN state for recovery probes, CircuitBreakerError, global singleton get_llm_circuit_breaker, and STRIX_DEBUG-aware litellm debug control.
LLM Request & Response Handling
strix/llm/llm.py, strix/llm/config.py
Integrated circuit breaker checks, exponential backoff retry logic for transient errors, streaming support via _should_use_streaming and _make_streaming_request, response caching integration, tracer event emission, and role-aware system prompt rendering. Added agent_role and enable_streaming to LLMConfig.
Request Queue & Streaming
strix/llm/request_queue.py
Added environment-driven concurrency/delay configuration, _acquire_slot and _release_slot for atomic slot management, new make_streaming_request public method with optional chunk callbacks and stop conditions, and _iter_stream helper for unified async/sync streaming traversal.
Response Caching
strix/llm/response_cache.py
Introduced in-memory LRU cache with TTL support via ResponseCache class, CacheEntry dataclass, hit/miss/eviction statistics, and global singleton get_global_cache with environment-based configuration.
Scope Configuration System
strix/scope/models.py, strix/scope/parser.py, strix/scope/validator.py, strix/scope/__init__.py
Added comprehensive Pydantic models (ScopeMetadata, ScopeSettings, TargetDefinition, NetworkDefinition, etc.), ScopeConfig parser with YAML/JSON support and CIDR pre-computation, ScopeValidator with multi-phase validation (targets, networks, credentials, modules), and public API exports.
CLI & Interface Scope Integration
strix/interface/main.py, strix/interface/cli.py, strix/interface/tui.py
Added --scope, --filter, --validate arguments to load and filter targets from scope files; added verbose/debug flags; integrated scope_context into scan_config; introduced real-time CLI event formatting via _format_event_for_cli.
Telemetry Event Streaming
strix/telemetry/tracer.py
Introduced EventType enum, TracerEvent dataclass, real-time event stream with in-memory storage and JSONL persistence, structured logging methods (log_agent_iteration, log_llm_request, log_tool_event, etc.), event cursor tracking, and optional callback invocation.
Role-Based Tool Access Control
strix/tools/registry.py, strix/tools/executor.py
Introduced TOOL_PROFILES mapping roles to available tools, is_tool_allowed_for_role validator, role-aware prompt generation in get_tools_prompt, and enforcement in validate_tool_availability and tool execution. Added parallelization strategy distinguishing SEQUENTIAL_TOOLS with concurrent batch execution. Added telemetry hooks (_emit_tool_*_event).
Tool Storage & Progress Tracking
strix/tools/notes/notes_actions.py, strix/tools/progress/progress_actions.py, strix/tools/progress/__init__.py, strix/tools/reporting/reporting_actions.py
Added disk-persisted notes.json storage, new progress persistence utility with save_progress, load_progress, list_progress tools and atomic file I/O, and vulnerability disk persistence in reporting with CSV indexing.
Proxy & Graph Management
strix/tools/proxy/proxy_manager.py, strix/tools/__init__.py, strix/tools/agents_graph/agents_graph_actions.py, strix/tools/agents_graph/agents_graph_actions_schema.xml
Converted get_proxy_manager to true singleton pattern; exported TOOL_PROFILES and is_tool_allowed_for_role from registry; added re-entrant lock (_graph_lock) for thread-safe graph mutations; added agent_role and model parameters to create_agent with inheritance and fallback logic.
Runtime Improvements
strix/runtime/docker_runtime.py, strix/runtime/tool_server.py
Added container cleanup helper with error tolerance; enhanced worker loop with timeout-based queue polling, consecutive error tracking, and process restart logic; added QUEUE_TIMEOUT and RESPONSE_TIMEOUT with telemetry emission on tool execution.
Prompt Templates
strix/prompts/coordination/root_agent.jinja, strix/prompts/technologies/proxmox_ve.jinja
Expanded root coordinator prompt with operational modes, scope awareness, destructive action approval workflow, multi-model consensus, and phase-based agent hierarchy; added comprehensive Proxmox VE adversarial testing guide covering API endpoints, privilege escalation, CVEs, and bypass techniques.
Scope Configuration Templates
templates/scope/scope.yaml, templates/scope/scope.json, templates/scope/proxmox-cluster.yaml
Added engagement scope templates in YAML and JSON formats with metadata, networks, targets, exclusions, and domain definitions; provided Proxmox cluster assessment example with three nodes, credentials structure, and recon-only mode.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant ScopeParser
    participant ScopeValidator
    participant Agent
    participant LLM as LLM Handler
    participant CircuitBreaker as Circuit Breaker
    participant Cache as Response Cache
    
    User->>CLI: --scope scope.yaml --verbose
    CLI->>ScopeParser: from_file(scope.yaml)
    ScopeParser->>ScopeParser: parse YAML/JSON
    ScopeParser->>ScopeValidator: validate()
    ScopeValidator-->>ScopeParser: ValidationResult
    ScopeParser-->>CLI: ScopeConfig
    CLI->>Agent: initialize with scope_context
    
    Agent->>LLM: request (with agent_role)
    LLM->>CircuitBreaker: can_execute?
    alt CircuitBreaker.OPEN
        CircuitBreaker-->>LLM: CircuitBreakerError
        LLM->>Agent: retry with backoff
    else CircuitBreaker.CLOSED or HALF_OPEN
        LLM->>Cache: get(model, messages)
        alt cache hit
            Cache-->>LLM: ModelResponse
            LLM->>Agent: return cached
        else cache miss
            LLM->>LLM: make_request() with streaming?
            LLM->>CircuitBreaker: record_success()
            LLM->>Cache: put(model, messages, response)
            LLM->>Agent: return response
        end
    end
    
    Agent->>Agent: log_agent_iteration (tracer)
    Agent->>Agent: emit TracerEvent
    CLI->>CLI: _format_event_for_cli()
    CLI-->>User: real-time event output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

  • strix/agents/state.py: Extensive refactoring of state machine; new AgentStatus enum and boolean-to-status migration patterns require careful validation of all transition paths.
  • strix/agents/base_agent.py: Status-driven dispatch logic and new private helpers introduce non-trivial control flow; MAX_EMPTY_RESPONSES handling and tracer integration need verification.
  • strix/llm/llm.py: Dense logic spanning circuit breaker checks, retry backoff, streaming, caching, and tracer events; validate error classification and recovery paths.
  • strix/scope/*: New subsystem with multi-phase validator, CIDR parsing, credential resolution, and filtering; verify validation rules are comprehensive and CIDR logic is correct.
  • strix/telemetry/tracer.py: Event streaming system with JSONL persistence and callbacks; verify atomicity of metadata saves and event ordering under concurrent emissions.
  • strix/tools/executor.py: Parallelization strategy for tools with sequential-vs-parallelizable distinction; verify batch execution preserves order and error propagation.
  • strix/prompts/coordination/root_agent.jinja: Significant restructuring; verify mode-based agent spawning logic and destructive action confirmation flow align with agent capabilities.

Possibly related PRs

  • refactor: simplify state management and improve robustness #1: Contains overlapping changes to AgentStatus/state refactoring, base_agent control-flow updates, tracer event streaming, request_queue/tool_server robustness fixes, and exported symbol visibility—directly related to this PR's core infrastructure changes.

Poem

🐰 A scope-bound journey through circuits and cache,
Events stream forth, no error to stash,
Roles guide the tools, agents coordinate true,
Status machines dance in COMPLETED debut,
Resilience blooms with each retry, each check—
Strix orchestrates safety, no breakage to wreck!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'Refactor/state management and robustness' is partially related to the changeset but does not accurately represent the primary change. While state management improvements are present (AgentStatus enum, state transitions), the changeset encompasses significantly broader architectural changes including scope configuration, circuit breakers, response caching, streaming support, progress tracking, documentation, Docker setup, and numerous other features. The title is too narrow and vague. Revise the title to reflect the full scope of changes. Consider 'Add scope configuration, state management, LLM robustness, and tooling improvements' or similar to capture the multi-faceted nature of this refactor, or break into smaller, more focused PRs by feature area.
Docstring Coverage ⚠️ Warning Docstring coverage is 49.75% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/state-management-and-robustness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
strix/agents/base_agent.py (2)

157-219: Empty-response failure currently reported as a successful completion

When MAX_EMPTY_RESPONSES is exceeded in _process_iteration, self.state.set_failed(error_msg) is called and the tracer is updated to "failed", and the method returns True. In agent_loop, any should_finish=True is treated as a successful completion:

  • Non-interactive: it calls self.state.set_completed({"success": True}) and returns a success payload.
  • Interactive: it calls _enter_waiting_state(tracer, task_completed=True), which logs "Task completed successfully" and updates tracer status to "completed".

This overwrites the failed state and misreports the outcome.

One way to fix this without changing the MAX_EMPTY_RESPONSES behavior is to branch on self.state.status when should_finish is true:

-            try:
-                should_finish = await self._process_iteration(tracer)
-                if should_finish:
-                    if self.non_interactive:
-                        self.state.set_completed({"success": True})
-                        if tracer:
-                            tracer.update_agent_status(self.state.agent_id, "completed")
-                        return self.state.final_result or {}
-                    await self._enter_waiting_state(tracer, task_completed=True)
-                    continue
+            try:
+                should_finish = await self._process_iteration(tracer)
+                if should_finish:
+                    current_status = self.state.status
+                    if current_status == AgentStatus.FAILED:
+                        # Propagate failure
+                        if self.non_interactive:
+                            return self._finalize_agent(tracer, success=False)
+                        await self._enter_waiting_state(tracer, error_occurred=True)
+                        continue
+
+                    # Normal successful completion path
+                    if self.non_interactive:
+                        return self._finalize_agent(tracer, success=True)
+                    await self._enter_waiting_state(tracer, task_completed=True)
+                    continue

This preserves the success path for genuine completions (e.g., when tools call a finish action) while correctly surfacing the empty‑response failure.

Also applies to: 404-437


326-361: In _enter_waiting_state, tracer status is set to "stopped" even when new_state is "waiting"

In the final else branch of _enter_waiting_state, new_state is set to "waiting" and reason to "Execution paused", but update_agent_status is called with "stopped". This makes the state transition log say running -> waiting while the agent’s status in the tracer is "stopped".

A minimal correction:

        if tracer:
            if task_completed:
                new_state = "completed"
                reason = "Task completed successfully"
                tracer.update_agent_status(self.state.agent_id, "completed")
            elif error_occurred:
                new_state = "error"
                reason = "An error occurred"
                tracer.update_agent_status(self.state.agent_id, "error")
            elif was_cancelled:
                new_state = "stopped"
                reason = "Execution cancelled by user"
                tracer.update_agent_status(self.state.agent_id, "stopped")
            else:
-                new_state = "waiting"
-                reason = "Execution paused"
-                tracer.update_agent_status(self.state.agent_id, "stopped")
+                new_state = "waiting"
+                reason = "Execution paused"
+                tracer.update_agent_status(self.state.agent_id, "waiting")

Optionally, you may also want to derive old_state from self.state.status instead of hard‑coding "running" for more accurate transition telemetry.

🧹 Nitpick comments (42)
strix/tools/proxy/proxy_manager.py (1)

782-786: Consider adding a reset mechanism for testability.

The singleton pattern makes it difficult to test with different configurations, since ProxyManager.__init__ accepts an optional auth_token parameter but the singleton always initializes without arguments. Consider adding a reset function or accepting initialization parameters on first call for better test isolation.

Example approach for testability:

def reset_proxy_manager() -> None:
    """Reset the singleton instance. For testing only."""
    global _PROXY_MANAGER
    with _PROXY_MANAGER_LOCK:
        if _PROXY_MANAGER is not None:
            _PROXY_MANAGER.close()
        _PROXY_MANAGER = None
strix/runtime/docker_runtime.py (2)

80-92: Robust centralized container cleanup; consider narrowing suppression and reusing helper.

Centralizing stop/remove into _cleanup_failed_container looks good and should reduce orphaned containers; behavior is idempotent via the NotFound/DockerException handling. Two small follow‑ups you might consider:

  • Narrow contextlib.suppress(Exception) around container.stop to DockerException/NotFound (or the specific docker SDK error types) so unexpected programming errors during cleanup are not silently swallowed.
  • Given this helper, destroy_sandbox could delegate to _cleanup_failed_container (by passing the container_id), so all cleanup paths share the same tolerant behavior and logging.

Please double‑check against the docker SDK docs / your tests that Container.stop and Container.remove only raise the expected docker errors in your environment before narrowing the suppression.


93-161: Retry logic and state reset in _create_container_with_retry look correct; only minor logging refinements to consider.

The revised loop correctly:

  • Uses _cleanup_failed_container both before creation and on failures to avoid stale containers.
  • Clears _scan_container, _tool_server_port, and _tool_server_token on each failed attempt, so callers never see half‑initialized state.
  • Captures and re‑raises the last exception with context after exhausting retries.

A few optional refinements:

  • _verify_image_available already logs failures with stack traces; combined with the per‑attempt warning here plus the final logger.exception, logs might get noisy under persistent docker issues. You could consider reducing one of these to a non‑exception log or adding more structured context (e.g., attempt number, backoff delay).
  • On the final attempt you both call logger.exception(...) and then raise a RuntimeError chained from last_exception; ensure this double-reporting is what you want in upstream logging/telemetry.

Functionally, the changes look solid and align well with the PR’s robustness goals.

Please verify via your integration tests that upstream callers handle the new RuntimeError message shape and that the additional logging volume is acceptable under repeated docker failures.

strix/runtime/tool_server.py (5)

68-71: Timeouts on queue operations improve robustness; consider configurability

The introduction of QUEUE_TIMEOUT on the worker side and the non-blocking behavior on empty queues looks good and prevents workers from hanging indefinitely when the queue is broken or idle. You might want to consider making QUEUE_TIMEOUT configurable (e.g., via env var) so it can be tuned independently from RESPONSE_TIMEOUT for different deployment profiles, but the current defaults are reasonable.

Also applies to: 95-102


72-88: Worker file logging: good addition; consider configurable directory and stricter perms

Per‑worker file logging to /tmp/strix_workers is useful for post‑mortem analysis and decouples worker logs from the main process. To address static-analysis concerns and hardening:

  • Make the log directory configurable (e.g., STRIX_WORKER_LOG_DIR) so operators can redirect logs away from /tmp if desired.
  • Optionally create the directory with restricted permissions (0o700) to avoid other users on the same host reading worker logs, even though this is sandbox‑only today.

Example tweak:

-    log_dir = Path("/tmp/strix_workers")
-    log_dir.mkdir(exist_ok=True)
+    log_dir = Path(os.getenv("STRIX_WORKER_LOG_DIR", "/tmp/strix_workers"))
+    log_dir.mkdir(mode=0o700, exist_ok=True)

92-131: Add logging to catch‑all exception paths to make worker failures diagnosable

The consecutive_errors logic and differentiation between argument/tool/unknown errors is solid. Right now, though, both the inner generic except Exception and the outer “critical worker error” handler only communicate via the response queue and (for the outer one) may silently swallow exceptions if the queue is broken.

To aid debugging, it would be helpful to log these exceptions to the worker’s file logger as well:

-            except Exception as e:  # noqa: BLE001
-                # Catch-all for unexpected errors
-                consecutive_errors += 1
-                response_queue.put({"error": f"Unexpected error ({type(e).__name__}): {e}"})
+            except Exception as e:  # noqa: BLE001
+                # Catch-all for unexpected errors
+                logging.getLogger(__name__).exception(
+                    "Unexpected error in agent worker %s", _agent_id
+                )
+                consecutive_errors += 1
+                response_queue.put({"error": f"Unexpected error ({type(e).__name__}): {e}"})
                 if consecutive_errors >= max_consecutive_errors:
                     response_queue.put({"error": "Worker terminating due to repeated errors"})
                     break
@@
-        except Exception as e:  # noqa: BLE001
-            # Critical error in worker loop itself
-            try:
-                response_queue.put({"error": f"Critical worker error: {e}"})
-            except Exception:  # noqa: BLE001
-                pass  # Queue might be broken
-            break  # Exit worker on critical errors
+        except Exception as e:  # noqa: BLE001
+            # Critical error in worker loop itself
+            logging.getLogger(__name__).exception(
+                "Critical worker error in agent worker %s", _agent_id
+            )
+            try:
+                response_queue.put({"error": f"Critical worker error: {e}"})
+            except Exception:  # noqa: BLE001
+                # Queue might be broken
+                pass
+            break  # Exit worker on critical errors

This keeps the API surface unchanged but makes repeated/unexpected failures much easier to root‑cause, and addresses the tryexceptpass static‑analysis warnings in a meaningful way.

Also applies to: 133-139


158-172: Tighten cleanup helper and simplify dict access with pop + optional logging

_cleanup_dead_process does the right thing functionally, but you can both satisfy the RUF051 hint and gain a bit more observability:

  • Use .pop(..., None) instead of if key in dict: del dict[key].
  • Consider logging (at debug level) when termination fails instead of bare except/pass.

Example:

 def _cleanup_dead_process(agent_id: str) -> None:
     """Clean up resources for a dead worker process."""
-    if agent_id in agent_processes:
-        try:
-            process = agent_processes[agent_id]["process"]
-            if process.is_alive():
-                process.terminate()
-                process.join(timeout=1)
-        except Exception:  # noqa: BLE001
-            pass
-        del agent_processes[agent_id]
-
-    if agent_id in agent_queues:
-        del agent_queues[agent_id]
+    process_info = agent_processes.pop(agent_id, None)
+    if process_info is not None:
+        try:
+            process = process_info["process"]
+            if process.is_alive():
+                process.terminate()
+                process.join(timeout=1)
+        except Exception:  # noqa: BLE001
+            logging.getLogger(__name__).debug(
+                "Error while cleaning up dead agent process %s", agent_id, exc_info=True
+            )
+
+    # Remove queues if present; they will be GC'ed once unreferenced
+    agent_queues.pop(agent_id, None)

This keeps behavior the same while improving clarity and debuggability.


191-191: HTTP response timeout and expanded exception handling are sensible; consider distinguishing worker‑death cases

Using RESPONSE_TIMEOUT with a blocking response_queue.get offloaded to a thread keeps the event loop unblocked and ensures /execute requests can’t hang indefinitely. The generic Exception catch returning "Unexpected error: ..." is also a good last‑resort guard.

One possible incremental improvement would be to special‑case EOFError/OSError from response_queue.get (which likely indicate the worker process or its IPC channel died) and return a clearer error such as "Worker process for agent X died while waiting for response", possibly coupled with a _cleanup_dead_process(agent_id) call to eagerly reset state for the next invocation. That can be done inside get_response_with_timeout without changing the external API.

Also applies to: 204-216, 223-224

.env.example (1)

4-15: Clarify example defaults and mention streaming flag

The example uses STRIX_LLM=zhipu/glm-4, while LLMConfig falls back to "openai/gpt-5" when STRIX_LLM is unset, and also supports LLM_STREAMING. To avoid confusion for users copying this file, consider either:

  • Documenting that the code default is "openai/gpt-5" when STRIX_LLM is omitted, or
  • Aligning the code default with this example, and
  • Adding an (optional) LLM_STREAMING=true|false line here for discoverability.
strix/tools/notes/notes_actions.py (2)

17-47: Tidy up global handling and remove unused noqa directives

The global usage here is reasonable, but the # noqa: PLW0603 comments at Lines 19 and 38 are now flagged as unused by Ruff (RUF100). Since PLW0603 isn’t enabled, they add noise without benefit. Consider removing those noqa directives to keep lint output clean.


49-70: Improve error diagnostics in _save_notes_to_disk

The atomic write pattern looks good. For better debugging when disk writes fail and to satisfy the Ruff hint:

  • Use logger.exception instead of logger.error in the except block so the traceback is captured.
  • Optionally refactor to return outside the try (per TRY300), but that’s stylistic.

Example:

-    except (OSError, IOError) as e:
-        logger.error(f"Failed to save notes to disk: {e}")
-        return False
+    except (OSError, IOError):
+        logger.exception("Failed to save notes to disk")
+        return False
strix/prompts/technologies/proxmox_ve.jinja (1)

1-274: Consider adding an explicit authorization/scope disclaimer

The guide is comprehensive and adversarial in nature. To align with safe use, consider adding a brief reminder near the top (e.g., in <scope> or a new <legal> block) that all testing must be performed only against systems that are explicitly in scope and with proper authorization. This keeps the prompt aligned with expected legal/ethical boundaries.

strix/llm/config.py (1)

11-12: Streaming flag behavior is sound; align documentation and examples

The new enable_streaming parameter with LLM_STREAMING fallback is straightforward and keeps streaming enabled by default. Two small follow-ups to consider:

  • Ensure .env.example (and any docs) mention LLM_STREAMING so users discover the override mechanism.
  • Decide whether you want the hardcoded "openai/gpt-5" default here to match the .env.example recommendation (STRIX_LLM=zhipu/glm-4) or explicitly document that they differ.

Functionally this looks fine as-is.

Also applies to: 21-21, 23-29

strix/tools/registry.py (1)

15-155: Role-based filtering logic in is_tool_allowed_for_role / get_tools_prompt looks correct

The interplay between TOOL_PROFILES, is_tool_allowed_for_role, and the updated get_tools_prompt(role) is consistent:

  • role=None and unknown roles remain fully permissive (backward compatible).
  • Known roles use their configured allow-lists, with None meaning “no restriction”.
  • get_tools_prompt filters the XML schema output using the same allow-lists, so prompts and enforcement stay aligned.

The main maintenance cost here is the duplicated lists across profiles; if this grows, consider factoring common subsets (e.g., shared “notes” or “progress” groups) to reduce repetition, but it’s not required for correctness now.

Also applies to: 349-376

strix/tools/progress/__init__.py (1)

5-5: Consider alphabetically sorting __all__ for consistency.

The static analysis tool suggests sorting __all__ entries alphabetically. While not strictly required, this convention improves maintainability and reduces merge conflicts.

Apply this diff:

-__all__ = ["save_progress", "load_progress", "list_progress"]
+__all__ = ["list_progress", "load_progress", "save_progress"]
Dockerfile.dev (1)

17-17: Consider pinning versions for Docker CLI and Poetry.

Currently, both Docker CLI and Poetry are installed without explicit version constraints, which could lead to unexpected behavior if breaking changes occur in future releases.

Consider pinning specific versions:

-    && apt-get install -y --no-install-recommends docker-ce-cli \
+    && apt-get install -y --no-install-recommends docker-ce-cli=5:27.* \
-RUN curl -sSL https://install.python-poetry.org | python3 - && \
+RUN curl -sSL https://install.python-poetry.org | POETRY_VERSION=1.8.5 python3 - && \

Also applies to: 21-21

strix/llm/__init__.py (1)

11-20: Consider alphabetically sorting __all__ for consistency.

Similar to other modules in the codebase, sorting the __all__ list improves maintainability.

Apply this diff:

 __all__ = [
+    "CircuitBreaker",
+    "CircuitBreakerError",
     "LLM",
     "LLMConfig",
     "LLMRequestFailedError",
-    "CircuitBreaker",
-    "CircuitBreakerError",
-    "get_llm_circuit_breaker",
     "ResponseCache",
     "get_global_cache",
+    "get_llm_circuit_breaker",
 ]
strix/agents/StrixAgent/strix_agent.py (1)

27-27: Remove unused noqa directive.

Static analysis indicates the noqa: PLR0912 directive is unused, suggesting the method's complexity may have been reduced or the rule is not enabled in the linter configuration.

Apply this diff:

-    async def execute_scan(self, scan_config: dict[str, Any]) -> dict[str, Any]:  # noqa: PLR0912
+    async def execute_scan(self, scan_config: dict[str, Any]) -> dict[str, Any]:
docs/PROMPTING_GUIDE.md (1)

175-175: Optional: Minor style improvement.

The phrase "Reconnaissance only, no exploitation" repeats "only" in close proximity. Consider rephrasing for better readability, though the current wording is clear.

Suggested alternative:

-| `recon only` | Reconnaissance only, no exploitation |
+| `recon only` | Reconnaissance without exploitation |
strix/scope/__init__.py (1)

17-34: Optional: Consider sorting __all__ for consistency.

The static analysis tool suggests sorting __all__ in isort-style. While the current grouping by category (Main class, Models, Validation) is logical and aids readability, alphabetical sorting within each group could improve maintainability.

Apply this diff if you prefer full alphabetical sorting:

 __all__ = [
-    # Main class
+    # Validation
+    "ValidationResult",
     "ScopeConfig",
+    "validate_scope",
+    "ScopeValidator",
     # Models
+    "CredentialDefinition",
+    "DomainScope",
+    "Exclusions",
+    "NetworkDefinition",
     "ScopeConfigModel",
     "ScopeMetadata",
     "ScopeSettings",
-    "NetworkDefinition",
+    "ServiceDefinition",
     "TargetDefinition",
-    "ServiceDefinition",
-    "CredentialDefinition",
-    "Exclusions",
-    "DomainScope",
-    # Validation
-    "ScopeValidator",
-    "ValidationResult",
-    "validate_scope",
 ]

Alternatively, keep the current semantic grouping and just sort within each section.

cache-implementation-project.md (2)

15-15: Optional: Add language identifiers to code blocks.

Several code blocks lack language specifiers, which affects syntax highlighting and readability. Consider adding appropriate language tags.

Examples:

-```
+```text
 /workspace/.strix/
 ├── cache/
 ...

- +text
Pros:


-```
+```text
**Cons:**


Also applies to: 81-81, 99-100, 105-106, 110-111, 136-136, 224-224, 310-310

---

`262-262`: **Optional: Add blank lines around tables.**

Markdown tables should be surrounded by blank lines for better rendering in some parsers. Consider adding blank lines before and after tables in the "Performance Considerations" section.




Also applies to: 280-280, 287-287, 293-293, 298-298

</blockquote></details>
<details>
<summary>strix/interface/cli.py (3)</summary><blockquote>

`22-142`: **Tracer event CLI formatting covers major event types; consider extending to remaining ones later**

The formatter cleanly handles agent iterations, LLM requests/responses/errors, tool lifecycle, state transitions, and inter-agent messaging, with sensible truncation and coloring. If you start emitting `VULNERABILITY_FOUND` or scan-level events via the tracer, you may want to add cases here so verbose mode shows those too, rather than silently dropping them.

---

`45-59`: **Consider treating `cost == 0` as a valid value when formatting LLM responses**

Right now:

```python
cost = data.get("cost")
...
cost_str = f", ${cost:.4f}" if cost else ""

If cost is 0.0, nothing is shown even though it’s a real value. If you’d like to display zero-cost responses explicitly, you could gate on is not None instead:

-        cost = data.get("cost")
+        cost = data.get("cost")
...
-        cost_str = f", ${cost:.4f}" if cost else ""
+        cost_str = f", ${cost:.4f}" if cost is not None else ""

250-259: Verbose flag/env parsing is quite strict; consider accepting more truthy variants

verbose_mode only turns on when args.verbose is truthy or STRIX_VERBOSE.lower() == "true". If you expect operators to use values like "1", "yes", or "on", you might normalize with a small helper (e.g., checking membership in a set of accepted truthy strings) rather than a single literal.

strix/tools/reporting/reporting_actions.py (2)

13-60: Disk persistence helper is solid; log full traceback on failure for easier debugging

The helper correctly creates a per-run vulnerabilities/ dir, writes a markdown file, and maintains a CSV index. In the failure path you currently log only the message:

except (OSError, IOError) as e:
    logger.error(f"Failed to save vulnerability {report_id} to disk: {e}")

Switching to logger.exception (or logger.error(..., exc_info=True)) would preserve the traceback and make field debugging of filesystem/permission issues much easier:

-    except (OSError, IOError) as e:
-        logger.error(f"Failed to save vulnerability {report_id} to disk: {e}")
+    except (OSError, IOError):
+        logger.exception("Failed to save vulnerability %s to disk", report_id)

86-124: Use the module logger for tracer-unavailable warnings for consistency

You define logger = logging.getLogger(__name__) but later use the root logger:

logging.warning("Global tracer not available - vulnerability report not stored")

For consistency and easier log routing, consider switching to the module logger:

-        logging.warning("Global tracer not available - vulnerability report not stored")
+        logger.warning("Global tracer not available - vulnerability report not stored")
strix/tools/agents_graph/agents_graph_actions_schema.xml (1)

85-90: New agent_role and model parameters are well-documented; consider adding an explicit model example

The descriptions for agent_role and model are detailed and give the LLM good guidance on how to choose roles/models. To fully exercise the new model parameter in the schema, you might add one example that explicitly sets a non-default model (e.g., a faster recon model vs. a heavier testing model), so tools and prompt-tuning see how it should be used in practice.

Also applies to: 96-126

strix/llm/response_cache.py (2)

35-48: max_size / ttl_seconds or semantics prevent explicitly passing 0

Using max_size or int(os.environ.get(...)) and ttl_seconds or float(os.environ.get(...)) means that passing 0 will be treated the same as None (you get the env/default instead):

self.max_size = max_size or int(os.environ.get("LLM_CACHE_MAX_SIZE", "100"))
self.ttl_seconds = ttl_seconds or float(os.environ.get("LLM_CACHE_TTL", "3600"))

If you ever need to distinguish “unset” from an explicit 0 (e.g., ttl_seconds=0 meaning “no caching” or “immediately expire”), consider using is None checks instead of truthiness.


155-159: Unused noqa on global in get_global_cache

The global _global_cache # noqa: PLW0603 line disables a warning that doesn’t appear to be enabled (Ruff reports the noqa as unused). You can safely drop the # noqa suffix here to keep lint output clean.

strix/llm/circuit_breaker.py (2)

114-137: Use the error parameter in record_failure or remove it

record_failure accepts an error argument but never uses it:

def record_failure(self, error: Exception | None = None) -> None:
    ...
    if self._state == CircuitState.HALF_OPEN:
        ...

Given you already log when the circuit opens or recovery fails, it would be useful to include the exception context there (e.g., via logger.exception or logger.warning(..., exc_info=error is not None)). If you don’t intend to log the exception at this level, consider dropping the parameter to avoid confusion and the unused-argument lint.


180-185: Unused noqa on global in get_llm_circuit_breaker

Similar to the cache module, global _llm_circuit_breaker # noqa: PLW0603 disables a warning that doesn’t seem to be active. Removing the # noqa keeps the file cleaner without changing behavior.

strix/tools/progress/progress_actions.py (2)

25-47: # noqa: PLW0603 markers on globals appear unnecessary

Both _get_progress_file and _load_progress_from_disk declare globals with # noqa: PLW0603:

global _progress_file_path  # noqa: PLW0603
...
global _progress_cache  # noqa: PLW0603

Ruff reports these noqa comments as unused. Unless you have a specific linter configuration that still warns on these, you can drop the # noqa suffixes and keep the explicit global statements.


61-83: Atomic progress persistence is nice; log full tracebacks on failure

The atomic write via a temp file and Path.replace is a good choice. In the failure path you currently do:

except (OSError, IOError) as e:
    logger.error(f"Failed to save progress to disk: {e}")

Switching to logger.exception (or exc_info=True) would capture the traceback, which is very helpful for diagnosing disk/permission issues:

-    except (OSError, IOError) as e:
-        logger.error(f"Failed to save progress to disk: {e}")
+    except (OSError, IOError):
+        logger.exception("Failed to save progress to disk")
strix/llm/llm.py (3)

320-376: Retry loop + circuit breaker + tracing flow looks sound, but note stacked retries with queue-level logic

The overall structure in generate()—history compression, circuit-breaker gating, retry loop around _make_request, and request/response tracing—is coherent and keeps transient vs non‑retryable errors clearly separated. One thing to be aware of is that this MAX_RETRIES loop wraps LLMRequestQueue._reliable_request, which is itself tenacity‑retried, so a single high‑level failure can represent multiple underlying provider calls. If that’s intentional, consider documenting the effective upper bound on attempts (queue retries × MAX_RETRIES + 1) so operators understand worst‑case latency and error amplification; otherwise, you may want to consolidate retry control into one layer.

Also applies to: 352-459


536-552: Streaming model gating is overly broad; "o4" substring will also match non‑reasoning models

_should_use_streaming() disables streaming for any model name containing "o1", "o3" or "o4". This will also catch models like gpt-4o/gpt-4o-mini, which are not the same as the o4* reasoning models and typically stream fine.

Consider tightening this to explicit model patterns, e.g.:

-        model_lower = self.config.model_name.lower()
-        no_stream_patterns = ["o1", "o3", "o4"]
-        if any(pat in model_lower for pat in no_stream_patterns):
+        model_lower = self.config.model_name.lower()
+        no_stream_patterns = [
+            "o1",           # legacy o1 alias
+            "o1-",          # specific dated variants
+            "o3-",
+            "o4-mini",      # reasoning models
+        ]
+        if any(model_lower == pat or model_lower.startswith(pat) for pat in no_stream_patterns):
             return False

or by reusing model_matches with a dedicated reasoning‑models pattern list for streaming decisions.


432-435: TRY003 (long error messages in raise calls) is stylistic; consider centralizing if you want Ruff clean

Ruff’s TRY003 warnings here are about building long/finalized error messages directly at the raise sites. Behavior is correct; if you care about a clean lint run, one option is to centralize these into helpers (or into LLMRequestFailedError) and call them from the exception sites to avoid repeating formatting.

Also applies to: 451-452, 460-467

strix/llm/request_queue.py (1)

31-38: Numeric config using or makes it impossible to pass 0 for delay via arguments

The __init__ uses max_concurrent or ... and delay_between_requests or .... For delay_between_requests, this means a caller cannot explicitly request 0 seconds via the argument (it will be treated as falsy and replaced with the env/default), even though 0.0 is a valid “no delay” value.

A slightly safer pattern is to distinguish None from 0:

-    def __init__(self, max_concurrent: int | None = None, delay_between_requests: float | None = None):
-        self.max_concurrent = max_concurrent or int(os.environ.get("LLM_RATE_LIMIT_CONCURRENT", "6"))
-        self.delay_between_requests = delay_between_requests or float(
-            os.environ.get("LLM_RATE_LIMIT_DELAY", "1.0")
-        )
+    def __init__(self, max_concurrent: int | None = None, delay_between_requests: float | None = None):
+        self.max_concurrent = (
+            max_concurrent
+            if max_concurrent is not None
+            else int(os.environ.get("LLM_RATE_LIMIT_CONCURRENT", "6"))
+        )
+        self.delay_between_requests = (
+            delay_between_requests
+            if delay_between_requests is not None
+            else float(os.environ.get("LLM_RATE_LIMIT_DELAY", "1.0"))
+        )
strix/interface/main.py (1)

407-475: Scope loading/validation and filter application look robust and user-friendly

_load_scope_file cleanly separates load, validate, and (optional) validation‑only output with good error messages, and _apply_scope_filters enforces a simple, explicit key:value filter syntax with immediate parser errors on bad input. The in‑place replacement of scope_config.model.targets keeps downstream behavior consistent while still allowing filtered views.

You can address Ruff’s F541 by changing the f-string without placeholders:

-        console.print(f"\n[bold cyan]Scope Summary:[/]")
+        console.print("\n[bold cyan]Scope Summary:[/]")

but this is purely stylistic.

strix/scope/parser.py (2)

133-139: Consider defensive path resolution for local_code targets.

Line 134 calls Path(target_value).resolve() which could raise OSError or RuntimeError if the path is inaccessible or contains cycles. While this may be acceptable for configuration validation, consider whether you want to fail fast or handle gracefully.

If graceful handling is preferred:

         elif target_type == "local_code":
-            resolved_path = str(Path(target_value).resolve())
+            try:
+                resolved_path = str(Path(target_value).resolve())
+            except (OSError, RuntimeError):
+                # Fall back to original path if resolution fails
+                resolved_path = str(Path(target_value).absolute())
             info = {
                 "type": "local_code",
                 "details": {"target_path": resolved_path},
                 "original": target_value,
             }

213-220: Minor: Unused loop variable _name.

Line 216 unpacks _name from the dict items but never uses it. Consider using .values() directly for clarity.

         # Check if IP is in any network CIDR
         try:
             ip = ipaddress.ip_address(target)
-            for _name, network in self._network_cidrs.items():
+            for network in self._network_cidrs.values():
                 if ip in network:
                     return True
         except ValueError:
             pass  # Not an IP
strix/telemetry/tracer.py (1)

452-456: Remove unused noqa directive.

Line 455 has a noqa: BLE001 comment but is using except Exception:, not a bare except:. The directive is unnecessary.

         if self.event_callback:
             try:
                 self.event_callback(event)
-            except Exception:  # noqa: BLE001
+            except Exception:
                 logger.exception("Error in event callback")

Based on static analysis hints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a11cd09 and f462b97.

⛔ Files ignored due to path filters (1)
  • templates/scope/scope-simple.csv is excluded by !**/*.csv
📒 Files selected for processing (42)
  • .env.example (1 hunks)
  • Dockerfile.dev (1 hunks)
  • README.md (1 hunks)
  • cache-implementation-project.md (1 hunks)
  • docker-compose.dev.yml (1 hunks)
  • docs/PROMPTING_GUIDE.md (1 hunks)
  • docs/SCOPE_CONFIGURATION.md (1 hunks)
  • strix/agents/StrixAgent/strix_agent.py (2 hunks)
  • strix/agents/__init__.py (1 hunks)
  • strix/agents/base_agent.py (8 hunks)
  • strix/agents/state.py (6 hunks)
  • strix/interface/cli.py (3 hunks)
  • strix/interface/main.py (4 hunks)
  • strix/interface/tui.py (1 hunks)
  • strix/llm/__init__.py (1 hunks)
  • strix/llm/circuit_breaker.py (1 hunks)
  • strix/llm/config.py (2 hunks)
  • strix/llm/llm.py (7 hunks)
  • strix/llm/request_queue.py (3 hunks)
  • strix/llm/response_cache.py (1 hunks)
  • strix/prompts/coordination/root_agent.jinja (1 hunks)
  • strix/prompts/technologies/proxmox_ve.jinja (1 hunks)
  • strix/runtime/docker_runtime.py (2 hunks)
  • strix/runtime/tool_server.py (3 hunks)
  • strix/scope/__init__.py (1 hunks)
  • strix/scope/models.py (1 hunks)
  • strix/scope/parser.py (1 hunks)
  • strix/scope/validator.py (1 hunks)
  • strix/telemetry/tracer.py (6 hunks)
  • strix/tools/__init__.py (3 hunks)
  • strix/tools/agents_graph/agents_graph_actions.py (10 hunks)
  • strix/tools/agents_graph/agents_graph_actions_schema.xml (1 hunks)
  • strix/tools/executor.py (7 hunks)
  • strix/tools/notes/notes_actions.py (7 hunks)
  • strix/tools/progress/__init__.py (1 hunks)
  • strix/tools/progress/progress_actions.py (1 hunks)
  • strix/tools/proxy/proxy_manager.py (1 hunks)
  • strix/tools/registry.py (2 hunks)
  • strix/tools/reporting/reporting_actions.py (2 hunks)
  • templates/scope/proxmox-cluster.yaml (1 hunks)
  • templates/scope/scope.json (1 hunks)
  • templates/scope/scope.yaml (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (16)
strix/tools/progress/__init__.py (1)
strix/tools/progress/progress_actions.py (3)
  • list_progress (199-243)
  • load_progress (158-195)
  • save_progress (87-154)
strix/scope/__init__.py (3)
strix/scope/models.py (9)
  • CredentialDefinition (52-58)
  • DomainScope (145-149)
  • Exclusions (135-142)
  • NetworkDefinition (61-77)
  • ScopeConfigModel (152-167)
  • ScopeMetadata (8-24)
  • ScopeSettings (27-41)
  • ServiceDefinition (44-49)
  • TargetDefinition (80-132)
strix/scope/parser.py (1)
  • ScopeConfig (25-457)
strix/scope/validator.py (3)
  • ScopeValidator (30-208)
  • ValidationResult (12-27)
  • validate_scope (211-214)
strix/agents/base_agent.py (3)
strix/agents/state.py (10)
  • AgentState (24-204)
  • AgentStatus (13-21)
  • should_stop (109-114)
  • increment_iteration (57-59)
  • is_approaching_max_iterations (139-140)
  • add_message (61-63)
  • add_error (83-85)
  • set_failed (97-102)
  • enter_waiting_state (119-125)
  • llm_failed (164-165)
strix/telemetry/tracer.py (6)
  • get_global_tracer (69-70)
  • log_agent_iteration (458-475)
  • update_agent_status (254-261)
  • log_tool_execution_start (222-244)
  • update_tool_execution (246-252)
  • log_agent_state_transition (477-495)
strix/llm/llm.py (1)
  • LLMRequestFailedError (56-60)
strix/agents/StrixAgent/strix_agent.py (2)
strix/llm/config.py (1)
  • LLMConfig (4-29)
strix/scope/parser.py (2)
  • targets (448-449)
  • networks (444-445)
strix/interface/tui.py (1)
strix/scope/parser.py (2)
  • get_agent_context (365-391)
  • get_exclusion_rules (354-363)
strix/tools/progress/progress_actions.py (2)
strix/tools/registry.py (1)
  • register_tool (240-287)
strix/telemetry/tracer.py (2)
  • get_global_tracer (69-70)
  • get_run_dir (119-128)
strix/tools/notes/notes_actions.py (1)
strix/telemetry/tracer.py (2)
  • get_global_tracer (69-70)
  • get_run_dir (119-128)
strix/tools/agents_graph/agents_graph_actions.py (3)
strix/agents/state.py (2)
  • stop_requested (156-157)
  • AgentState (24-204)
strix/llm/config.py (1)
  • LLMConfig (4-29)
strix/telemetry/tracer.py (2)
  • get_global_tracer (69-70)
  • log_agent_message_sent (569-586)
strix/telemetry/tracer.py (1)
strix/interface/cli.py (1)
  • event_callback (255-256)
strix/scope/models.py (1)
strix/scope/parser.py (6)
  • metadata (436-437)
  • settings (440-441)
  • networks (444-445)
  • targets (448-449)
  • exclusions (452-453)
  • domains (456-457)
strix/scope/parser.py (1)
strix/scope/models.py (10)
  • CredentialDefinition (52-58)
  • DomainScope (145-149)
  • Exclusions (135-142)
  • NetworkDefinition (61-77)
  • ScopeConfigModel (152-167)
  • ScopeMetadata (8-24)
  • ScopeSettings (27-41)
  • TargetDefinition (80-132)
  • get_target_type (120-132)
  • get_target_value (116-118)
strix/agents/__init__.py (1)
strix/agents/state.py (2)
  • AgentState (24-204)
  • AgentStatus (13-21)
strix/scope/validator.py (4)
strix/scope/models.py (2)
  • ScopeConfigModel (152-167)
  • get_target_value (116-118)
strix/agents/state.py (1)
  • add_error (83-85)
strix/scope/parser.py (3)
  • targets (448-449)
  • networks (444-445)
  • exclusions (452-453)
strix/prompts/__init__.py (1)
  • get_all_module_names (25-29)
strix/tools/reporting/reporting_actions.py (2)
strix/tools/registry.py (1)
  • register_tool (240-287)
strix/telemetry/tracer.py (1)
  • get_run_dir (119-128)
strix/interface/cli.py (3)
strix/telemetry/tracer.py (4)
  • EventType (18-46)
  • Tracer (78-659)
  • TracerEvent (50-64)
  • set_global_tracer (73-75)
strix/interface/utils.py (1)
  • get_severity_color (30-38)
strix/scope/parser.py (2)
  • get_agent_context (365-391)
  • get_exclusion_rules (354-363)
strix/runtime/tool_server.py (2)
strix/tools/argument_parser.py (2)
  • ArgumentConversionError (9-12)
  • convert_arguments (15-47)
strix/tools/registry.py (1)
  • get_tool_by_name (290-291)
🪛 LanguageTool
docs/PROMPTING_GUIDE.md

[style] ~175-~175: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...------| | recon only | Reconnaissance only, no exploitation | | poc only | Gener...

(ADVERB_REPETITION_PREMIUM)

🪛 markdownlint-cli2 (0.18.1)
cache-implementation-project.md

15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


81-81: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


86-86: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


99-99: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


100-100: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


105-105: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


106-106: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


110-110: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


111-111: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


136-136: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


224-224: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


262-262: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


280-280: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


287-287: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


293-293: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


298-298: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


310-310: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.5)
strix/tools/progress/__init__.py

5-5: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

strix/scope/__init__.py

17-34: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

strix/llm/circuit_breaker.py

114-114: Unused method argument: error

(ARG002)


141-145: Avoid specifying long messages outside the exception class

(TRY003)


182-182: Unused noqa directive (non-enabled: PLW0603)

Remove unused noqa directive

(RUF100)

strix/agents/StrixAgent/strix_agent.py

27-27: Unused noqa directive (non-enabled: PLR0912)

Remove unused noqa directive

(RUF100)

strix/tools/progress/progress_actions.py

27-27: Unused noqa directive (non-enabled: PLW0603)

Remove unused noqa directive

(RUF100)


46-46: Unused noqa directive (non-enabled: PLW0603)

Remove unused noqa directive

(RUF100)


79-79: Consider moving this statement to an else block

(TRY300)


82-82: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


146-151: Consider moving this statement to an else block

(TRY300)

strix/llm/response_cache.py

157-157: Unused noqa directive (non-enabled: PLW0603)

Remove unused noqa directive

(RUF100)

strix/tools/notes/notes_actions.py

19-19: Unused noqa directive (non-enabled: PLW0603)

Remove unused noqa directive

(RUF100)


38-38: Unused noqa directive (non-enabled: PLW0603)

Remove unused noqa directive

(RUF100)


66-66: Consider moving this statement to an else block

(TRY300)


69-69: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

strix/telemetry/tracer.py

455-455: Unused noqa directive (unused: BLE001)

Remove unused noqa directive

(RUF100)

strix/llm/llm.py

432-435: Avoid specifying long messages outside the exception class

(TRY003)


451-451: Avoid specifying long messages outside the exception class

(TRY003)


458-458: Avoid specifying long messages outside the exception class

(TRY003)


462-465: Avoid specifying long messages outside the exception class

(TRY003)


466-466: Avoid specifying long messages outside the exception class

(TRY003)

strix/scope/models.py

23-23: Avoid specifying long messages outside the exception class

(TRY003)


40-40: Avoid specifying long messages outside the exception class

(TRY003)


76-76: Avoid specifying long messages outside the exception class

(TRY003)

strix/scope/parser.py

66-66: Avoid specifying long messages outside the exception class

(TRY003)


75-75: Avoid specifying long messages outside the exception class

(TRY003)

strix/llm/__init__.py

11-20: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

strix/interface/main.py

424-424: Do not catch blind exception: Exception

(BLE001)


448-448: f-string without any placeholders

Remove extraneous f prefix

(F541)

strix/tools/executor.py

82-82: Avoid specifying long messages outside the exception class

(TRY003)


85-87: Avoid specifying long messages outside the exception class

(TRY003)


338-338: Local variable stack_trace is assigned to but never used

Remove assignment to unused variable stack_trace

(F841)


407-407: Consider moving this statement to an else block

(TRY300)

strix/tools/reporting/reporting_actions.py

56-56: Consider moving this statement to an else block

(TRY300)


59-59: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

strix/runtime/tool_server.py

77-77: Probable insecure usage of temporary file or directory: "/tmp/strix_workers"

(S108)


137-138: try-except-pass detected, consider logging the exception

(S110)


166-167: try-except-pass detected, consider logging the exception

(S110)


171-171: Use pop instead of key in dict followed by del dict[key]

Replace if statement with .pop(..., None)

(RUF051)

🔇 Additional comments (30)
strix/tools/proxy/proxy_manager.py (1)

779-780: Singleton storage introduced.

The module-level _PROXY_MANAGER variable enables singleton behavior for the proxy manager, which aligns with the PR's state management refactoring goals.

strix/runtime/tool_server.py (2)

142-156: Agent process creation helper looks good; minor note on daemon processes

The _create_agent_process helper nicely centralizes worker creation and queue registration; using it from ensure_agent_process simplifies the lifecycle. Using daemon=True is fine here given you also have explicit cleanup in cleanup_all_agents, and it ensures workers don’t survive the main process unexpectedly.


175-188: Agent process health check and restart behavior LGTM

The updated ensure_agent_process behavior—restarting the worker when process.is_alive() is False and otherwise reusing existing queues—is clear and matches the docstring. The warning log on restart, plus the separation into _create_agent_process and _cleanup_dead_process, make the agent lifecycle much easier to follow and reason about.

strix/tools/notes/notes_actions.py (1)

117-120: On-demand disk loading and persisted_to_disk flag behavior looks good

The pattern of:

  • Lazy-loading from disk on first access when _notes_storage is empty, and
  • Immediately calling _save_notes_to_disk() on create/update/delete and surfacing persisted_to_disk in the response

is coherent and should make failure modes visible to callers without breaking in‑memory behavior. No functional issues stand out here.

Also applies to: 158-160, 180-183, 212-215, 245-247, 261-264, 271-273, 278-281

strix/tools/progress/__init__.py (1)

1-5: LGTM! Clean package initialization.

The re-export pattern is correctly implemented and the module provides a clear interface for progress tracking functionality.

README.md (1)

199-236: Excellent documentation for the new Scope Configuration feature.

The section provides clear examples, command usage patterns, and references to templates. The documentation effectively guides users through complex assessment scenarios with scope files.

strix/agents/__init__.py (1)

2-2: LGTM! Clean addition to the public API.

The AgentStatus enum is now properly exposed, supporting the status-driven agent control flow introduced in this PR.

Also applies to: 8-8

docker-compose.dev.yml (2)

18-18: Verify that Docker socket and host networking are intended for development only.

The configuration grants the container access to the Docker daemon via socket mount and removes network isolation with network_mode: host. While common in development environments (especially for spawning sandbox containers), these settings have security implications:

  • Docker socket access allows full control over the host's Docker daemon
  • Host networking bypasses container network isolation

Ensure this configuration is documented as development-only and not used in production or CI/CD environments.

Also applies to: 25-25


10-10: Good practice: Using read-only mounts for source code.

The source code, target directory, and templates are correctly mounted as read-only (:ro), preventing accidental modifications from within the container.

Also applies to: 12-12, 14-14

strix/interface/tui.py (1)

313-325: LGTM! Clean scope configuration integration.

The TUI now correctly augments the scan configuration with scope context and exclusion rules when a scope file is provided. The conditional logic properly checks for the attribute's presence before accessing it.

Dockerfile.dev (1)

28-38: Good practice: Optimized layer caching for dependencies.

The Dockerfile correctly separates dependency installation (lines 28-32) from application code copying (lines 35-38), which optimizes Docker layer caching and speeds up rebuilds during development.

strix/agents/StrixAgent/strix_agent.py (2)

12-25: LGTM! Clean agent role initialization.

The agent_role assignment logic correctly identifies root agents and propagates the role through both LLMConfig and AgentState, supporting role-based behavior throughout the system.


30-31: Well-structured scope context injection.

The scope context and exclusion rules are cleanly integrated into the task description with clear XML-style delimiters. This provides agents with structured engagement boundaries while maintaining backward compatibility when scope is not provided.

Also applies to: 98-132

docs/SCOPE_CONFIGURATION.md (1)

1-407: LGTM! Comprehensive and well-structured documentation.

The scope configuration documentation is thorough, covering all necessary aspects from quick start to advanced integration. The examples are clear, the structure is logical, and it effectively bridges user-facing features with the underlying implementation.

docs/PROMPTING_GUIDE.md (1)

1-421: LGTM! Excellent prompting guide.

This documentation provides clear guidance for both interactive and headless usage, with well-structured examples and comprehensive reference tables. The architecture diagrams and keyword mappings are particularly helpful.

strix/tools/__init__.py (3)

13-22: LGTM! Role-based tool access control imports.

The addition of TOOL_PROFILES and is_tool_allowed_for_role to the public API surface enables role-based tool filtering, which aligns with the broader PR's goal of enhancing agent state management and security.


35-35: LGTM! Progress tooling integration.

The import of the progress module in non-sandbox mode extends the tool surface with save/load/list capabilities for agent progress tracking.


52-69: LGTM! Exports align with imports.

The __all__ list correctly includes the newly imported TOOL_PROFILES and is_tool_allowed_for_role symbols.

templates/scope/scope.json (1)

1-86: LGTM! Well-structured scope template.

The JSON template provides a clear, comprehensive example of the scope configuration format. The structure aligns with the ScopeConfigModel and includes appropriate placeholder values for all key fields.

strix/scope/__init__.py (1)

1-34: LGTM! Clean public API surface.

The scope package exposes a well-organized public API with appropriate models, parsing, and validation utilities. The semantic grouping in __all__ (Main class, Models, Validation) aids readability.

templates/scope/scope.yaml (1)

1-100: LGTM! Clear and comprehensive YAML template.

The YAML template provides excellent inline documentation and covers all scope configuration aspects. The comments guide users effectively, and the structure aligns with both the JSON template and the underlying ScopeConfigModel.

cache-implementation-project.md (1)

1-429: LGTM! Comprehensive cache design document.

This design document thoroughly explores the cache implementation options, provides a sensible phased approach, and includes detailed data structures and integration points. The progression from file-backed MVP to Redis production deployment is well-reasoned.

templates/scope/proxmox-cluster.yaml (1)

1-101: LGTM! Well-structured Proxmox assessment template.

The template provides a realistic Proxmox cluster assessment scenario with appropriate network segmentation, target definitions, and exclusions. The structure aligns with the scope model, and the configurations are sensible for a hypervisor cluster assessment.

strix/interface/cli.py (1)

200-204: Scope context wiring into scan_config looks correct

Using hasattr(args, "scope_config") and injecting both scope_context and exclusion_rules via the parser’s get_agent_context() / get_exclusion_rules() aligns well with the scope model and keeps CLI behavior in sync with the TUI.

strix/tools/progress/progress_actions.py (2)

157-196: load_progress behavior and error messages look clear

The input validation, lazy on-disk load, and failure response (including an available_keys list) are well thought out and should make it easy for agents to recover from missing keys.


198-235: list_progress output structure is practical for UIs/agents

Returning a sorted list of entries with key, timestamps, and a simple size_hint string, plus total_count, gives enough structure for both human-readable views and automated consumers.

strix/llm/request_queue.py (1)

65-89: Streaming path and slot management look correct and resilient

The streaming API (make_streaming_request_streaming_request_iter_stream) correctly shares the same slot‑acquisition and delay behavior as make_request, accumulates content, respects an early‑exit stop_condition, and always releases the semaphore in a finally block. The _iter_stream wrapper handling both async and sync generators is a nice touch to normalize different litellm streaming backends.

Also applies to: 103-154

strix/prompts/coordination/root_agent.jinja (1)

1-379: Coordination prompt is well-structured and consistent with a non-executing controller role

The expanded root‑agent prompt clearly separates coordination from execution, encodes operational modes, scope/safety constraints, inter‑agent messaging, and reporting expectations in a structured way. It should give the coordinator a much more predictable behavior surface while keeping destructive actions gated behind user confirmation.

strix/agents/base_agent.py (1)

249-277: Consecutive empty-response handling and iteration warnings are a good robustness improvement

The MAX_EMPTY_RESPONSES guard and _check_iteration_warnings logic add valuable self‑correction: they nudge the model away from empty replies, push it toward calling finish tools as it nears max_iterations, and ultimately fail fast after repeated non‑responses. Aside from the failure‑vs‑completion issue already noted, the corrective user messages and explicit instruction about finish_scan/agent_finish should improve overall agent reliability.

Also applies to: 404-437

strix/tools/executor.py (1)

399-432: Resolve UnboundLocalError in parallel execution batch.

parallel_batch = [] inside execute_parallel_batch() rebinds the name in the inner scope, so the earlier if not parallel_batch: raises UnboundLocalError the first time we flush a batch. That means any parallelizable tool immediately crashes the executor. Clear the shared list in place instead of rebinding it.

Apply this diff so the outer list stays accessible:

-        parallel_batch = []
+        parallel_batch.clear()

Likely an incorrect or invalid review comment.

Comment on lines +278 to +300
def _handle_llm_error(self, e: LLMRequestFailedError, tracer: Optional["Tracer"]) -> None:
"""Handle LLM request failure."""
error_msg = str(e)
error_details = getattr(e, "details", None)
self.state.add_error(error_msg)

if self.non_interactive:
self.state.set_failed(error_msg)
else:
self.state.enter_waiting_state(llm_failed=True)

if tracer:
tracer.update_agent_status(self.state.agent_id, "llm_failed", error_msg)
if error_details:
tracer.log_tool_execution_start(
self.state.agent_id,
"llm_error_details",
{"error": error_msg, "details": error_details},
)
tracer.update_tool_execution(
tracer._next_execution_id - 1, "failed", error_details
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tracer tool execution update should use the returned execution_id, not a private counter

In _handle_llm_error, you correctly log a synthetic tool execution with "llm_error_details", but you ignore the execution_id returned by log_tool_execution_start and instead call update_tool_execution with tracer._next_execution_id - 1. This relies on a private attribute and can easily desynchronize if other executions are logged concurrently.

Align this with the pattern used in __init__:

-            if error_details:
-                tracer.log_tool_execution_start(
-                    self.state.agent_id,
-                    "llm_error_details",
-                    {"error": error_msg, "details": error_details},
-                )
-                tracer.update_tool_execution(
-                    tracer._next_execution_id - 1, "failed", error_details
-                )
+            if error_details:
+                exec_id = tracer.log_tool_execution_start(
+                    agent_id=self.state.agent_id,
+                    tool_name="llm_error_details",
+                    args={"error": error_msg, "details": error_details},
+                )
+                tracer.update_tool_execution(
+                    execution_id=exec_id,
+                    status="failed",
+                    result=error_details,
+                )

This removes dependency on tracer internals and guarantees the right execution record is updated.

Comment thread strix/interface/main.py
Comment on lines +335 to +353
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help=(
"Enable verbose output showing real-time agent events "
"(iterations, LLM requests/responses, tool executions, etc.). "
"Best used with -n/--non-interactive mode."
),
)

parser.add_argument(
"--debug",
action="store_true",
help=(
"Enable debug mode with full LiteLLM debug logging. "
"Shows detailed API request/response information."
),
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

--debug and --verbose together will downgrade logging from DEBUG to INFO

In main(), --debug sets the root logger to DEBUG, but the subsequent --verbose block always sets it to INFO. If a user passes both flags, they’ll end up with INFO‑level logs instead of DEBUG, which is counter‑intuitive.

One way to preserve intent is to make these mutually consistent, e.g.:

-    # Enable debug mode if requested
-    if getattr(args, "debug", False):
+    # Enable debug mode if requested
+    if getattr(args, "debug", False):
         ...
         logging.getLogger().setLevel(logging.DEBUG)
-
-    # Enable verbose mode via environment for CLI to pick up
-    if getattr(args, "verbose", False):
-        os.environ["STRIX_VERBOSE"] = "true"
-        # Verbose implies we want to see more logging too
-        logging.getLogger().setLevel(logging.INFO)
+    # Enable verbose mode via environment for CLI to pick up
+    elif getattr(args, "verbose", False):
+        os.environ["STRIX_VERBOSE"] = "true"
+        logging.getLogger().setLevel(logging.INFO)

so --debug takes precedence when both are supplied.

Also applies to: 640-653

🤖 Prompt for AI Agents
In strix/interface/main.py around lines 335-353 (and similarly at 640-653), the
current logic sets the root logger to DEBUG when --debug is set but then
unconditionally sets it to INFO when --verbose is set, so passing both flags
results in INFO instead of DEBUG; change the condition so --debug takes
precedence: only set logging to INFO for --verbose when --debug is not set (or
set logging based on the highest-priority flag by checking debug first), and
apply the same precedence fix to the block at 640-653 so that when both flags
are supplied the root logger remains at DEBUG.

Comment thread strix/llm/__init__.py
Comment on lines +24 to +25
if os.getenv("STRIX_DEBUG", "").lower() != "true":
litellm._logging._disable_debugging()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify compatibility with litellm's private API.

The code accesses litellm._logging._disable_debugging(), which is a private method (double underscore prefix). Private APIs are not guaranteed to be stable across versions and may break without notice.

Consider asking the maintainers to check litellm documentation or release notes for the stability of this API, or implement a try-except fallback:

try:
    litellm._logging._disable_debugging()
except (AttributeError, Exception):
    # Fallback: set log level or disable via public API if available
    pass

🏁 Script executed:

#!/bin/bash
# Check if litellm version is pinned and verify the private API is documented
rg -n "litellm[>=<]" pyproject.toml poetry.lock

# Search for any litellm version-related issues or breaking changes in dependencies
rg -n "_disable_debugging" --type py

Length of output: 66






🏁 Script executed:

#!/bin/bash
# Check pyproject.toml for litellm dependency
echo "=== pyproject.toml litellm dependency ==="
rg -n "litellm" pyproject.toml -A 2 -B 2

# Check poetry.lock for exact litellm version
echo "=== poetry.lock litellm version ==="
rg -n "name = \"litellm\"" poetry.lock -A 5

# Look for any litellm documentation or imports in the codebase
echo "=== litellm imports and usage ==="
rg -n "import litellm" --type py

Length of output: 1080


🌐 Web query:

litellm library public API disable logging debugging

💡 Result:

Short answer — yes. Ways to disable or reduce LiteLLM logging via the public API:

  • Per-request: include "no-log": true in the request body to skip tracking/logging for that call. [1][2]
  • Per-request (callbacks): send header x-litellm-disable-callbacks: <callback_name> to disable specific callbacks for that request (enterprise feature). [1]
  • Config-level: control logging/callbacks in your config.yaml (e.g., turn_off_message_logging / litellm_settings callbacks / global_disable_no_log_param). [1][2]
  • Observability integrations (e.g., Langfuse) also accept per-call flags/metadata; use the same "no-log" flag to disable logging for specific calls. [2]

References:

  1. LiteLLM logging docs (shows no-log, callbacks, config flags). [1]
  2. LiteLLM Langfuse / observability docs (per-call no-log and metadata examples). [2]

Add error handling for private API usage.

The code uses litellm._logging._disable_debugging() (strix/llm/init.py:25), a private method not guaranteed to remain stable across versions. While litellm version is constrained to ~1.79.1, this private API could break in future releases.

Implement the suggested fallback to handle potential API changes:

try:
    litellm._logging._disable_debugging()
except (AttributeError, Exception):
    # Fallback: private API unavailable in this litellm version
    pass
🤖 Prompt for AI Agents
In strix/llm/__init__.py around lines 24 to 25, the code calls the private API
litellm._logging._disable_debugging() which may not exist in other litellm
versions; wrap that call in a try/except that catches AttributeError (and a
general Exception as a safe fallback) and silently handles the missing/private
API by no-oping in the except block so the package continues to work if the
private method is absent or raises.

Comment thread strix/llm/llm.py
Comment on lines +468 to +495
def _emit_llm_error(
self, tracer: Any, request_id: str | None, start_time: float, error: str,
error_type: str | None = None, details: str | None = None
) -> None:
"""Emit LLM error event to tracer with full details."""
if tracer and self.agent_id and request_id:
duration_ms = (time.time() - start_time) * 1000
# Build detailed error message
error_parts = [error]
if error_type:
error_parts.insert(0, f"[{error_type}]")
if details:
# Truncate very long details but keep useful info
if len(details) > 500:
details = details[:500] + "..."
error_parts.append(f"\nDetails: {details}")

full_error = " ".join(error_parts) if error_type else error
if details:
full_error = f"{full_error}\nDetails: {details}"

tracer.log_llm_error(
agent_id=self.agent_id,
request_id=request_id,
error=full_error,
duration_ms=duration_ms,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Bug: _emit_llm_error duplicates “Details” text in the final error message

_emit_llm_error appends a Details: segment to error_parts and then appends another Details: line when building full_error, so detailed traces are logged twice.

A minimal fix is to build error_parts once and drop the second concatenation:

-        error_parts = [error]
-        if error_type:
-            error_parts.insert(0, f"[{error_type}]")
-        if details:
-            # Truncate very long details but keep useful info
-            if len(details) > 500:
-                details = details[:500] + "..."
-            error_parts.append(f"\nDetails: {details}")
-
-        full_error = " ".join(error_parts) if error_type else error
-        if details:
-            full_error = f"{full_error}\nDetails: {details}"
+        error_parts: list[str] = []
+        if error_type:
+            error_parts.append(f"[{error_type}]")
+        error_parts.append(error)
+        if details:
+            # Truncate very long details but keep useful info
+            if len(details) > 500:
+                details = details[:500] + "..."
+            error_parts.append(f"Details: {details}")
+
+        full_error = " ".join(error_parts)
🤖 Prompt for AI Agents
In strix/llm/llm.py around lines 468 to 495, the function _emit_llm_error
appends a "Details:" entry to error_parts and then again concatenates a second
"Details:" block into full_error, producing duplicated details in logs; remove
the second concatenation and build full_error once from " ".join(error_parts)
(or join with "\n" if you want the Details on a new line), keeping the existing
truncation logic and the conditional inclusion of error_type so details are only
added once.

Comment on lines +113 to +127
def invalidate(self, model: str | None = None) -> int:
"""Invalidate cache entries, optionally filtered by model.

Returns the number of entries invalidated.
"""
with self._lock:
if model is None:
count = len(self._cache)
self._cache.clear()
return count

# Would need to store model in entry to filter - for now just clear all
count = len(self._cache)
self._cache.clear()
return count

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

invalidate(model=...) ignores the model argument; align docstring with behavior

The docstring says entries are “optionally filtered by model”, but the current implementation always clears the entire cache and even comments that model filtering isn’t implemented yet:

if model is None:
    ...
# Would need to store model in entry to filter - for now just clear all
count = len(self._cache)
self._cache.clear()

To avoid surprising callers, either (a) actually store the model in CacheEntry and filter accordingly, or (b) update the docstring to state that model is currently ignored and that the method invalidates all entries.

🤖 Prompt for AI Agents
In strix/llm/response_cache.py around lines 113 to 127, the invalidate method's
docstring claims it can filter by model but the implementation always clears the
entire cache; update the docstring and method comment to state that the model
argument is currently ignored and the method invalidates all entries (add a
short TODO noting model-based invalidation would require storing model on
CacheEntry), and ensure the signature and return behavior remain unchanged so
callers aren’t misled.

Comment thread strix/scope/parser.py
Comment on lines +325 to +348
def _resolve_credential(self, cred: CredentialDefinition) -> dict[str, str] | None:
"""Resolve a credential definition to actual values."""
result: dict[str, str] = {
"username": cred.username,
"access_level": cred.access_level,
}

if cred.password_env:
password = os.environ.get(cred.password_env)
if password:
result["password"] = password
result["type"] = "password"
else:
return None # Env var not set

if cred.token_env:
token = os.environ.get(cred.token_env)
if token:
result["token"] = token
result["type"] = "token"
elif not cred.password_env:
return None # Neither password nor token available

return result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Credential type can be ambiguous when both password and token are present.

Lines 332-346 in _resolve_credential set result["type"] to "password" when password_env is present, then potentially overwrite it with "token" if token_env is also present. This means if both are provided, only the "token" type is recorded, even though both credentials are in the result dict.

Consider one of these approaches:

Option 1: Support both types explicitly:

     def _resolve_credential(self, cred: CredentialDefinition) -> dict[str, str] | None:
         """Resolve a credential definition to actual values."""
         result: dict[str, str] = {
             "username": cred.username,
             "access_level": cred.access_level,
         }
+        auth_types = []
 
         if cred.password_env:
             password = os.environ.get(cred.password_env)
             if password:
                 result["password"] = password
-                result["type"] = "password"
+                auth_types.append("password")
             else:
                 return None  # Env var not set
 
         if cred.token_env:
             token = os.environ.get(cred.token_env)
             if token:
                 result["token"] = token
-                result["type"] = "token"
+                auth_types.append("token")
             elif not cred.password_env:
                 return None  # Neither password nor token available
 
+        if auth_types:
+            result["type"] = "+".join(auth_types)  # e.g., "password+token"
         return result

Option 2: Make password and token mutually exclusive (validate in the model).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _resolve_credential(self, cred: CredentialDefinition) -> dict[str, str] | None:
"""Resolve a credential definition to actual values."""
result: dict[str, str] = {
"username": cred.username,
"access_level": cred.access_level,
}
if cred.password_env:
password = os.environ.get(cred.password_env)
if password:
result["password"] = password
result["type"] = "password"
else:
return None # Env var not set
if cred.token_env:
token = os.environ.get(cred.token_env)
if token:
result["token"] = token
result["type"] = "token"
elif not cred.password_env:
return None # Neither password nor token available
return result
def _resolve_credential(self, cred: CredentialDefinition) -> dict[str, str] | None:
"""Resolve a credential definition to actual values."""
result: dict[str, str] = {
"username": cred.username,
"access_level": cred.access_level,
}
auth_types = []
if cred.password_env:
password = os.environ.get(cred.password_env)
if password:
result["password"] = password
auth_types.append("password")
else:
return None # Env var not set
if cred.token_env:
token = os.environ.get(cred.token_env)
if token:
result["token"] = token
auth_types.append("token")
elif not cred.password_env:
return None # Neither password nor token available
if auth_types:
result["type"] = "+".join(auth_types)
return result
🤖 Prompt for AI Agents
In strix/scope/parser.py around lines 325 to 348, the credential "type" field is
overwritten when both password_env and token_env are present; change the logic
to support both credentials instead of letting token overwrite password: when
both env vars resolve successfully, include both "password" and "token" in
result and set result["type"] to a distinct value (e.g., "both" or
"password+token"); preserve the existing None returns when neither credential is
available and keep existing single-type behavior when only one is present.

Comment on lines +86 to +152
@register_tool(sandbox_execution=False)
def save_progress(
key: str,
data: dict[str, Any],
append: bool = False,
) -> dict[str, Any]:
"""Save structured progress data to disk for persistence across crashes.

Use this to track:
- Scanned endpoints/targets
- Discovered services
- Intermediate findings
- Task completion status
- Any data you want to persist and reference later

Args:
key: Unique identifier for this progress data (e.g., "scanned_ports", "discovered_services")
data: The data to save (must be JSON-serializable)
append: If True and key exists with a list value, append data to the list instead of replacing

Returns:
Success status and file path
"""
try:
if not key or not key.strip():
return {"success": False, "error": "Key cannot be empty"}

if not isinstance(data, dict):
return {"success": False, "error": "Data must be a dictionary"}

# Load existing progress on first access
if not _progress_cache:
_load_progress_from_disk()

key = key.strip()
timestamp = datetime.now(UTC).isoformat()

if append and key in _progress_cache:
existing = _progress_cache[key].get("data")
if isinstance(existing, list) and isinstance(data.get("items"), list):
# Append items to existing list
existing.extend(data["items"])
_progress_cache[key]["updated_at"] = timestamp
else:
# Can't append, just update
_progress_cache[key] = {
"data": data,
"created_at": _progress_cache[key].get("created_at", timestamp),
"updated_at": timestamp,
}
else:
_progress_cache[key] = {
"data": data,
"created_at": timestamp,
"updated_at": timestamp,
}

# Immediately persist to disk
persisted = _save_progress_to_disk()

return {
"success": True,
"message": f"Progress '{key}' saved successfully",
"key": key,
"persisted_to_disk": persisted,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

save_progress docstring mentions file path but the return payload does not

The docstring says:

Returns:
    Success status and file path

but the returned dict is:

{
    "success": True,
    "message": ...,
    "key": key,
    "persisted_to_disk": persisted,
}

To avoid confusion (especially for tools consuming the schema), either add a file_path field (e.g., pointing to progress.json when available) or update the docstring to match the current return shape.

🧰 Tools
🪛 Ruff (0.14.5)

146-151: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In strix/tools/progress/progress_actions.py around lines 86 to 152 the
save_progress docstring claims the function returns a file path but the actual
return dict contains "success", "message", "key", and "persisted_to_disk" (no
file path). Update the docstring's Returns section to accurately list and
describe the returned fields (success: bool, message: str, key: str,
persisted_to_disk: bool) or, alternatively, modify the return value to include a
"file_path" key when available — pick one consistent approach and implement it
so docstring and actual return shape match.

Comment on lines 782 to 786
def get_proxy_manager() -> ProxyManager:
global _PROXY_MANAGER
if _PROXY_MANAGER is None:
return ProxyManager()
_PROXY_MANAGER = ProxyManager()
return _PROXY_MANAGER

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Singleton initialization is not thread-safe.

The check-then-act pattern without synchronization creates a race condition. If multiple threads call get_proxy_manager() concurrently before initialization, they could all see _PROXY_MANAGER as None and create multiple instances, violating the singleton guarantee.

Apply this diff to add thread-safe lazy initialization:

+import threading
+
+_PROXY_MANAGER: ProxyManager | None = None
+_PROXY_MANAGER_LOCK = threading.Lock()


 def get_proxy_manager() -> ProxyManager:
-    global _PROXY_MANAGER
-    if _PROXY_MANAGER is None:
-        _PROXY_MANAGER = ProxyManager()
-    return _PROXY_MANAGER
+    global _PROXY_MANAGER
+    if _PROXY_MANAGER is None:
+        with _PROXY_MANAGER_LOCK:
+            if _PROXY_MANAGER is None:
+                _PROXY_MANAGER = ProxyManager()
+    return _PROXY_MANAGER

Note: The double-checked locking pattern checks twice to avoid lock contention after initialization.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In strix/tools/proxy/proxy_manager.py around lines 782 to 786, the lazy
singleton creation for _PROXY_MANAGER is not thread-safe; wrap initialization
with a module-level threading.Lock and use double-checked locking: add a
_PROXY_MANAGER_LOCK = threading.Lock() at module scope (and import threading if
missing), then in get_proxy_manager() first check if _PROXY_MANAGER is None, if
so acquire the lock, check again, create and assign ProxyManager() only inside
the lock, then release the lock and return the singleton; this ensures
thread-safe lazy initialization while avoiding unnecessary lock contention after
the instance is created.

Comment thread strix/tools/registry.py
Comment on lines +15 to +155
# Role-based tool profiles - None means all tools available
TOOL_PROFILES: dict[str, list[str] | None] = {
"root": [
"create_agent",
"view_agent_graph",
"finish_scan",
"send_message_to_agent",
"wait_for_message",
"think",
# Progress tracking (available to all roles)
"save_progress",
"load_progress",
"list_progress",
# Notes
"create_note",
"list_notes",
],
"recon": [
"terminal_execute",
"python_action",
"browser_action",
"list_requests",
"view_request",
"send_request",
"repeat_request",
"scope_rules",
"list_sitemap",
"view_sitemap_entry",
"think",
"agent_finish",
"create_agent",
"view_agent_graph",
"send_message_to_agent",
"wait_for_message",
"str_replace_editor",
"list_files",
"search_files",
"web_search",
# Progress tracking
"save_progress",
"load_progress",
"list_progress",
# Notes
"create_note",
"list_notes",
"update_note",
],
"testing": [
"terminal_execute",
"python_action",
"browser_action",
"list_requests",
"view_request",
"send_request",
"repeat_request",
"scope_rules",
"list_sitemap",
"view_sitemap_entry",
"think",
"agent_finish",
"create_agent",
"view_agent_graph",
"send_message_to_agent",
"wait_for_message",
"str_replace_editor",
"list_files",
"search_files",
"web_search",
# Progress tracking
"save_progress",
"load_progress",
"list_progress",
# Notes
"create_note",
"list_notes",
"update_note",
# Vulnerability reporting
"create_vulnerability_report",
],
"validation": [
"terminal_execute",
"python_action",
"browser_action",
"list_requests",
"view_request",
"send_request",
"repeat_request",
"scope_rules",
"list_sitemap",
"view_sitemap_entry",
"think",
"agent_finish",
"str_replace_editor",
"list_files",
"search_files",
"send_message_to_agent",
# Progress tracking
"save_progress",
"load_progress",
"list_progress",
# Notes
"create_note",
"list_notes",
# Vulnerability reporting
"create_vulnerability_report",
],
"reporting": [
"create_vulnerability_report",
"str_replace_editor",
"list_files",
"search_files",
"think",
"agent_finish",
"send_message_to_agent",
# Progress tracking
"save_progress",
"load_progress",
"list_progress",
# Notes
"create_note",
"list_notes",
"update_note",
],
"fixing": [
"str_replace_editor",
"list_files",
"search_files",
"terminal_execute",
"python_action",
"think",
"agent_finish",
"send_message_to_agent",
# Progress tracking
"save_progress",
"load_progress",
"list_progress",
# Notes
"create_note",
"list_notes",
],
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

delete_note is not allowed for any built‑in role

TOOL_PROFILES defines per‑role tool access, but delete_note is missing from all profiles. Given that delete_note is a registered tool, any agent using one of the known roles (root, recon, testing, validation, reporting, fixing) and going through is_tool_allowed_for_role will never be able to call it.

If this is unintentional, consider adding delete_note to the appropriate roles (likely at least root and the roles that can update_note) to avoid a silent loss of functionality.

Example (minimal):

     "root": [
         "create_agent",
         "view_agent_graph",
         "finish_scan",
@@
         # Notes
         "create_note",
         "list_notes",
+        "update_note",
+        "delete_note",
     ],
@@
     "recon": [
@@
         # Notes
         "create_note",
         "list_notes",
         "update_note",
+        "delete_note",
     ],

…and similarly for any other roles you want to permit deletions for.

Also applies to: 313-347

🤖 Prompt for AI Agents
In strix/tools/registry.py around lines 15 to 155, the TOOL_PROFILES dict omits
"delete_note" from all built-in roles so agents with those roles cannot call the
registered delete_note tool; update the dict to include "delete_note" for the
appropriate roles (at minimum "root" and any roles that already have
"update_note" such as "recon", "testing", "reporting") by adding "delete_note"
to each relevant role's list so permission checks via is_tool_allowed_for_role
will allow note deletion.

Comment on lines +102 to +111
test_focus:
- "API authentication bypass"
- "Privilege escalation (user to admin)"
- "VM escape vectors"
- "Container (LXC) escape"
- "Cluster authentication (corosync)"
- "Default credentials"
- "Known CVEs (CVE-2022-35508, etc.)"
- "Backup access controls"
- "Storage permission misconfigurations"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Invalid field: test_focus is not part of the scope model.

The test_focus section is not a valid top-level field in ScopeConfigModel. If this is meant to be documentation, convert it to a comment. If it's meant to be part of the configuration, consider adding these focus areas to the individual target definitions using the focus_areas field.

Apply this diff to convert to a comment:

-# Focus areas for Proxmox testing
-test_focus:
-  - "API authentication bypass"
-  - "Privilege escalation (user to admin)"
-  - "VM escape vectors"
-  - "Container (LXC) escape"
-  - "Cluster authentication (corosync)"
-  - "Default credentials"
-  - "Known CVEs (CVE-2022-35508, etc.)"
-  - "Backup access controls"
-  - "Storage permission misconfigurations"
+# Focus areas for Proxmox testing:
+# - API authentication bypass
+# - Privilege escalation (user to admin)
+# - VM escape vectors
+# - Container (LXC) escape
+# - Cluster authentication (corosync)
+# - Default credentials
+# - Known CVEs (CVE-2022-35508, etc.)
+# - Backup access controls
+# - Storage permission misconfigurations

Alternatively, add these to each target's focus_areas field:

targets:
  - host: "10.0.101.2"
    name: "pve-mgmt"
    type: "infrastructure"
    network: "Proxmox Management"
    focus_areas: 
      - "authentication_bypass"
      - "privilege_escalation"
      - "vm_escape"
    # ... rest of target definition
🤖 Prompt for AI Agents
In templates/scope/proxmox-cluster.yaml around lines 102 to 111, the top-level
field `test_focus` is invalid for ScopeConfigModel; either convert that block
into a YAML comment or move each focus item into the appropriate target's
`focus_areas` list. To fix, replace the `test_focus` mapping with commented
lines (prefix with #) if it is only documentation, or add the corresponding
normalized focus entries (e.g., authentication_bypass, privilege_escalation,
vm_escape, container_escape, cluster_authentication, default_credentials,
known_cves, backup_access_controls, storage_permissions) under each relevant
target’s `focus_areas` array in the `targets:` section so the config conforms to
the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant