Status: Implemented
Author: Mohamed Ameen
Date: 2026-04-17
Last Updated: 2026-04-17
Reviewers: --
Package: src/plugins/
Entry Point: N/A (library-only, consumed at orchestrator startup)
The plugin system allows third-party packages to extend AutoDev without modifying its source. Plugins are discovered at runtime via Python's importlib.metadata.entry_points mechanism and classified into three kinds: QA gates, tournament judges, and agent extensions. The system achieves zero coupling -- plugin authors need not import any AutoDev types. Instead, they implement one of three @runtime_checkable Protocol classes that define the required shape (structural subtyping).
In scope:
- Three plugin protocols:
QAGatePlugin,JudgeProviderPlugin,AgentExtensionPlugin. PluginRegistrydataclass that buckets discovered instances by kind.discover_plugins()function that walks entry points, loads, instantiates, classifies, and registers plugins.- Carrier dataclasses
QAContextandGateResultfor QA gate plugins. - Failure isolation guarantees.
Out of scope:
- Built-in QA gate implementations (Phase 8).
- Plugin configuration / parameterization beyond zero-arg construction.
- Plugin lifecycle management (hot-reload, unload).
- Plugin versioning or compatibility negotiation.
Plugins sit at the boundary of AutoDev's core pipeline. QA gate plugins are invoked during the QA phase after agent code generation. Judge provider plugins are invoked during tournament rounds to rank candidate solutions. Agent extension plugins contribute or override agent definitions during workspace initialization. All three are optional extensions -- AutoDev functions fully without any plugins installed.
flowchart TB
subgraph "Third-Party Packages"
P1["mypkg.plugins:MyQAGate"]
P2["mypkg.plugins:MyJudge"]
P3["mypkg.plugins:MyAgentExt"]
end
subgraph "pyproject.toml entry points"
EP["autodev.plugins group"]
end
subgraph "AutoDev Runtime"
DP["discover_plugins()"]
REG["PluginRegistry"]
QAB["qa_gates dict"]
JB["judges dict"]
AB["agents dict"]
end
P1 --> EP
P2 --> EP
P3 --> EP
EP --> DP
DP --> REG
REG --> QAB
REG --> JB
REG --> AB
- FR-1: Discover all plugins declared under the
autodev.pluginsentry-point group usingimportlib.metadata.entry_points. - FR-2: Classify each discovered plugin into exactly one of three buckets (QA gate, judge, agent extension) based on which
@runtime_checkableProtocol it satisfies. - FR-3: Plugins that fail to load (
ImportError), fail to instantiate (exception in__init__), or match no supported Protocol are logged at WARNING level and skipped -- never crash the host. - FR-4: Each plugin must expose a
name: strattribute. If missing or empty, the entry-point name is used as fallback. - FR-5: Name collisions within a bucket are resolved by last-writer-wins (entry-point iteration order is deterministic by package name).
- FR-6: Pre-instantiated values (non-class targets) are accepted directly without calling
target(). - FR-7: Provide
QAContextandGateResultcarrier dataclasses for QA gate plugins.
- Crash-safety: A malformed or broken plugin must never crash the AutoDev host process. All failures are caught, logged, and skipped.
- Subprocess isolation: Plugin discovery runs in the main orchestrator process. Plugin execution (e.g.,
QAGatePlugin.run()) is async and must not block the event loop. - Asyncio concurrency:
QAGatePlugin.run()andJudgeProviderPlugin.rank()are async methods, allowing the orchestrator to invoke them concurrently with bounded semaphores. - Pydantic v2 strict validation: The plugin system itself does not use Pydantic models for plugin data (carrier dataclasses are plain
@dataclass), preserving the zero-coupling guarantee. - LLM cost efficiency: Plugins do not inherently make LLM calls. However, judge provider plugins will make LLM calls when ranking -- this is bounded by the tournament engine's round limits.
- Maintainability: All logging uses
structlogvia the project'sautologgingmodule.
- Plugin classes must be instantiable with zero arguments (
cls()). - Plugins must not import AutoDev types at module level (zero-coupling).
- Entry-point group name is fixed to
autodev.plugins. runtime_checkableProtocol checking only validates attribute presence, not method signatures. This is a known Python limitation accepted as sufficient for this loose contract.
flowchart TB
subgraph "src/plugins/"
init["__init__.py<br/>Re-exports public API"]
reg["registry.py<br/>Protocols + Registry + discover_plugins()"]
end
subgraph "Protocols"
QAP["QAGatePlugin"]
JPP["JudgeProviderPlugin"]
AEP["AgentExtensionPlugin"]
end
subgraph "Carriers"
QAC["QAContext"]
GR["GateResult"]
end
reg --> QAP
reg --> JPP
reg --> AEP
reg --> QAC
reg --> GR
init --> reg
| File | Element | Responsibility |
|---|---|---|
src/plugins/__init__.py |
Module init | Re-exports all public symbols from registry.py |
src/plugins/registry.py |
QAGatePlugin |
Protocol for custom QA gates |
src/plugins/registry.py |
JudgeProviderPlugin |
Protocol for custom tournament judges |
src/plugins/registry.py |
AgentExtensionPlugin |
Protocol for agent definition contributors |
src/plugins/registry.py |
QAContext |
Dataclass carrier: inputs for QA gate execution |
src/plugins/registry.py |
GateResult |
Dataclass carrier: verdict from a QA gate |
src/plugins/registry.py |
PluginRegistry |
Dataclass bucketing discovered plugins by kind |
src/plugins/registry.py |
discover_plugins() |
Entry-point walker and classifier |
src/plugins/registry.py |
_plugin_name() |
Helper to extract name from instance or fallback to entry-point name |
@dataclass
class QAContext:
"""Inputs handed to QAGatePlugin.run()."""
cwd: Path # Repo worktree path
task_id: str # Task correlation ID
diff: str | None = None # Optional raw diff text
@dataclass
class GateResult:
"""Verdict emitted by a QAGatePlugin."""
passed: bool
details: str = ""@dataclass
class PluginRegistry:
"""Buckets discovered plugin instances by kind."""
qa_gates: dict[str, QAGatePlugin] = field(default_factory=dict)
judges: dict[str, JudgeProviderPlugin] = field(default_factory=dict)
agents: dict[str, AgentExtensionPlugin] = field(default_factory=dict)
@property
def total(self) -> int:
return len(self.qa_gates) + len(self.judges) + len(self.agents)
def is_empty(self) -> bool:
return self.total == 0@runtime_checkable
class QAGatePlugin(Protocol):
"""A custom QA gate that evaluates a checked-out diff."""
name: str
async def run(self, ctx: QAContext) -> GateResult: ...
@runtime_checkable
class JudgeProviderPlugin(Protocol):
"""A custom tournament judge returning a ranking of version ids."""
name: str
async def rank(self, task: str, versions: list[Any]) -> list[str]: ...
@runtime_checkable
class AgentExtensionPlugin(Protocol):
"""A plugin that contributes or overrides an agent definition."""
name: str
def get_spec(self) -> Any: ...
def render_platform(self, platform: str) -> str: ...Key design points:
- All three protocols are
@runtime_checkable, enablingisinstance()checks at discovery time. QAGatePlugin.run()andJudgeProviderPlugin.rank()areasyncso long-running operations (subprocess gates, LLM calls) do not block the event loop.AgentExtensionPlugin.get_spec()returnsAny(duck-typed) to avoid a hard import cycle withadapters.types.AgentSpec.- The
name: strattribute is required by all three protocols for registration keying.
discover_plugins(group: str = "autodev.plugins") -> PluginRegistry
The sole entry point for the plugin system. Called once at orchestrator startup.
| Parameter | Type | Default | Description |
|---|---|---|---|
group |
str |
"autodev.plugins" |
Entry-point group to scan. Override only in tests. |
Returns a PluginRegistry populated with all successfully loaded plugins.
| Decision | Rationale | Alternatives Considered |
|---|---|---|
@runtime_checkable Protocol for plugin contracts |
Zero coupling -- plugin authors never need to import autodev. Structural subtyping checks shape at runtime. |
ABC with register(), zope.interface, explicit marker base class |
Plain @dataclass for carriers (not Pydantic) |
Carriers cross the plugin boundary. Requiring Pydantic would force a dependency on autodev's Pydantic version in every plugin package. | Pydantic BaseModel, TypedDict, plain dict |
Entry-point group autodev.plugins (single group) |
Simplicity. One group for all plugin kinds; classification is done via Protocol isinstance check. | Separate groups per kind (autodev.qa_gates, autodev.judges, autodev.agents) |
| Last-writer-wins for name collisions | Entry-point iteration order is alphabetical by package name (stdlib behavior), making this deterministic. Explicit collision handling would add complexity for an unlikely scenario. | Raise on collision, merge, first-writer-wins |
Instantiate targets with target() only if they are classes |
Supports both class-based plugins (cls()) and pre-instantiated module-level singletons. |
Always require classes, always require instances |
- Loose contract vs. strong typing:
runtime_checkableonly checks attribute presence, not method signatures or return types. A plugin could satisfy the Protocol check but return wrong types at runtime. This is accepted as the cost of zero-coupling. - Single-group vs. multi-group: Using one entry-point group means every plugin is loaded and checked against all three Protocols. With many plugins, this is O(N * 3) isinstance checks. Acceptable for the expected plugin count (< 50).
- No plugin configuration: Plugins must be constructable with zero arguments. Any configuration must come from the plugin's own environment variables or config files. This simplifies discovery but limits parameterization.
Discovery pipeline (discover_plugins):
flowchart TB
A["entry_points(group='autodev.plugins')"] --> B{For each EP}
B --> C["ep.load()"]
C -->|ImportError| LOG1["log WARNING, skip"]
C -->|success| D{Is target a class?}
D -->|yes| E["target()"]
D -->|no| F["Use target as-is"]
E -->|Exception| LOG2["log WARNING, skip"]
E -->|success| G[instance]
F --> G
G --> H{isinstance QAGatePlugin?}
H -->|yes| I["reg.qa_gates[name] = instance"]
H -->|no| J{isinstance JudgeProviderPlugin?}
J -->|yes| K["reg.judges[name] = instance"]
J -->|no| L{isinstance AgentExtensionPlugin?}
L -->|yes| M["reg.agents[name] = instance"]
L -->|no| LOG3["log WARNING: protocol_mismatch"]
I --> B
K --> B
M --> B
LOG3 --> B
The classification uses elif chains (not independent if blocks), meaning a plugin is bucketed into exactly one kind even if it satisfies multiple protocols. Priority order: QAGatePlugin > JudgeProviderPlugin > AgentExtensionPlugin.
Name resolution (_plugin_name):
Returns instance.name if it exists and is a non-empty string; otherwise falls back to the entry-point name (ep.name).
Discovery (discover_plugins) is synchronous and runs once at startup. Plugin execution (QAGatePlugin.run(), JudgeProviderPlugin.rank()) is async and can be run concurrently by the orchestrator using asyncio.gather with bounded semaphores.
Not applicable. Plugins run in-process. If a plugin needs to spawn a subprocess (e.g., a linter gate), that is the plugin's responsibility.
Not applicable. The plugin system does not write to disk.
The discovery pipeline wraps every step in try/except Exception:
| Step | Failure Mode | Handling |
|---|---|---|
entry_points(group=group) |
stdlib failure (defensive) | Log WARNING, return empty registry |
ep.load() |
ImportError, missing dependency |
Log WARNING with entry-point name and error, continue |
target() instantiation |
Exception in __init__ |
Log WARNING with entry-point name and error, continue |
Protocol isinstance check |
No match | Log WARNING protocol_mismatch with type name, continue |
This ensures that a single broken plugin never takes down the entire AutoDev process.
- Internal:
autologging.get_logger - stdlib:
importlib.metadata.entry_points,dataclasses,pathlib,typing - External: None
Plugin discovery is configuration-free. Plugins are registered solely through pyproject.toml entry points:
[project.entry-points."autodev.plugins"]
my_qa_gate = "mypkg.plugins:MyQAGate"
my_judge = "mypkg.plugins:MyJudge"
my_agent = "mypkg.plugins:MyAgentExtension"The group parameter of discover_plugins() defaults to "autodev.plugins" and can be overridden in tests.
autologging: Structured logging for discovery events.- No other AutoDev components are imported by the plugin system.
AgentExtensionPlugin.get_spec() returns an AgentSpec-compatible object (duck-typed from adapters.types). The import is intentionally lazy to avoid a hard cycle. The consuming component (workspace initializer) is responsible for validating the returned spec.
The plugin system does not write ledger events. Plugin discovery results may be logged by the orchestrator.
- QA phase engine: Iterates
registry.qa_gatesand invokes each gate'srun()method. - Tournament engine: Iterates
registry.judgesand invokes each judge'srank()method. - Workspace initializer: Iterates
registry.agentsand merges contributed specs into the agent roster. - Orchestrator startup: Calls
discover_plugins()once and passes thePluginRegistryto downstream components.
- Python packaging ecosystem: Plugins are installed as standard Python packages with entry-point declarations.
pip install mypkgis the only step needed to make a plugin visible.
- Protocol satisfaction: Create minimal classes that satisfy each protocol and verify
isinstance()returns True. - Protocol rejection: Create classes missing required attributes and verify
isinstance()returns False. - PluginRegistry.total and is_empty: Verify counting across all three buckets.
- _plugin_name: Verify fallback to entry-point name when
nameattribute is missing, empty, or non-string. - GateResult and QAContext: Verify construction and field access.
- Mock entry points: Use
importlib.metadatatest utilities or monkeypatching to simulate entry points with valid plugins, broken plugins, and protocol mismatches. Verify the registry is populated correctly and warnings are logged. - Name collision: Register two plugins with the same
name. Verify the latter wins. - Pre-instantiated target: Register a module-level instance (not a class) as an entry point. Verify it is accepted without calling
target().
- Hypothesis for PluginRegistry: Generate random combinations of plugin instances across buckets. Assert
totalalways equals the sum of bucket sizes.
- Minimal stub classes implementing each protocol (3 stubs).
- Broken stub classes that raise in
__init__, miss thenameattribute, or implement no protocol. - Mock entry-point objects or monkeypatched
importlib.metadata.entry_points.
- Arbitrary code execution: Loading an entry point executes arbitrary Python code. This is inherent to the entry-point mechanism. Operators must trust installed packages, just as they trust any
pip install. - No sandboxing: Plugins run in the same process as AutoDev. A malicious plugin has full access to the filesystem and environment. Future work may introduce subprocess isolation for untrusted plugins.
- No secret exposure: The plugin system does not handle secrets.
QAContextcontains only the working directory, task ID, and diff text.
- Discovery latency:
entry_points()scans installed package metadata, which is I/O-bound on first call but typically cached by the Python runtime. Expected latency: < 100ms for typical environments. - Classification overhead: Three
isinstance()checks per plugin, each O(1) attribute lookup. Negligible. - Plugin execution: Entirely dependent on the plugin implementation. Async methods prevent blocking the event loop, but expensive plugins (e.g., running a full test suite) should be bounded by the orchestrator's concurrency semaphore.
The plugin system is part of the core autodev wheel (src/plugins/). No separate entry point is needed for the discovery mechanism itself.
Third-party plugins register via their own pyproject.toml:
[project.entry-points."autodev.plugins"]
my_qa_gate = "mypkg.plugins:MyQAGate"None. Plugin discovery is automatic at orchestrator startup. Discovered plugins may be listed via autodev status or a future autodev plugins list command.
# Successful discovery summary
log.info("plugins.discover.done",
qa_gates=len(reg.qa_gates),
judges=len(reg.judges),
agents=len(reg.agents),
)
# Failed to load entry point
log.warning("plugins.load_failed", name=ep.name, error=str(exc))
# Failed to instantiate plugin class
log.warning("plugins.instantiate_failed", name=ep.name, error=str(exc))
# Plugin matches no supported protocol
log.warning("plugins.protocol_mismatch",
name=ep.name,
type=type(instance).__name__,
)None. Plugin discovery results are logged but not persisted to .autodev/.
A future autodev status or autodev plugins list could display:
- Number of discovered plugins per bucket.
- Plugin names and source packages.
- Any plugins that failed to load (from structured logs).
The plugin system itself makes zero LLM calls. Cost implications depend on the plugins:
| Plugin Kind | LLM Calls | Notes |
|---|---|---|
| QA gate | 0 (typically) | Most QA gates run linters/tests, not LLM calls |
| Judge provider | 1 per ranking call | Custom judges may invoke LLMs; bounded by tournament max_rounds |
| Agent extension | 0 | Contributes specs, does not invoke LLMs directly |
- Plugin configuration: Allow plugins to declare configuration schemas that are merged into
.autodev/config.jsonunder apluginskey. - Plugin versioning: Support version constraints so AutoDev can reject incompatible plugins.
- Hot-reload: Detect newly installed plugins without restarting AutoDev.
- Subprocess isolation: Run untrusted plugins in isolated subprocesses for security.
- Multi-protocol plugins: Allow a single plugin to satisfy multiple protocols (currently only the first match is used due to
elifchaining). - Built-in QA gates (Phase 8): Implement the default gates (syntax, lint, build, test, secretscan) as internal plugins using the same Protocol.
- Should plugins be able to declare dependencies on other plugins?
- Should the
elifclassification be changed to allow multi-protocol plugins (e.g., a plugin that is both a QA gate and a judge)? - Should plugin execution have its own timeout/guardrail independent of the task-level guardrails?
- Should
QAContextbe extended with additional fields (e.g., config reference, repo metadata) for Phase 8?
- ADR-005: Protocol-based plugin system (rationale for
@runtime_checkableProtocol over ABC/marker class)
src/plugins/__init__.py-- Public API re-exportssrc/plugins/registry.py-- Protocol definitions, PluginRegistry, discover_plugins()src/adapters/types.py-- AgentSpec (duck-typed by AgentExtensionPlugin)- Python documentation: importlib.metadata entry points
- Python documentation: typing.runtime_checkable
| Date | Author | Changes |
|---|---|---|
| 2026-04-17 | Mohamed Ameen | Initial draft |