Skip to content

feat(agents): PINA multi-agent team with pydantic-graph + MLflow + A2A/AG-UI#33

Merged
bjoernbethge merged 24 commits into
mainfrom
feat/agents-pina-team
Apr 21, 2026
Merged

feat(agents): PINA multi-agent team with pydantic-graph + MLflow + A2A/AG-UI#33
bjoernbethge merged 24 commits into
mainfrom
feat/agents-pina-team

Conversation

@bjoernbethge

Copy link
Copy Markdown
Collaborator

Summary

  • New marimo_flow.agents package: multi-agent PINA team orchestrated by pydantic-graph, persisted + traced via MLflow, exposed via marimo chat + A2A + AG-UI
  • RouteNode classifier dispatches between five specialists (Notebook, Problem, Model, Solver, MLflow); each loads its .claude/Skills/<name>/SKILL.md as lazy instructions= (no message-history bloat)
  • Lead agent (build_lead_agent) wraps the graph as a single tool — same backend powers mo.ui.chat (lead_chat), agent.to_a2a() (with AgentCard node_skills), and agent.to_ag_ui()
  • Models: Ollama Cloud (:cloud tags) via OpenAIChatModel — single http://localhost:11434/v1 endpoint covers local + cloud
  • State (FlowState) is JSON-serialisable + persisted as MLflow artifacts (MLflowStatePersistence); live PINA/torch objects live in FlowDeps.registry keyed by URI
  • examples/lab.py rewritten as full chat demo with state inspector and live mermaid diagram

Plan & Tasks

Followed docs/superpowers/plans/2026-04-21-marimo-flow-pina-team.md — all 20 tasks complete.

Plan-Bugs caught & fixed mid-execution

  • OpenAIModel deprecated in pydantic-ai 1.84 → switched to OpenAIChatModel
  • SimpleStatePersistence.load_all raises NotImplementedError → use FullStatePersistence instead
  • from __future__ import annotations made the planned __annotations__["next_node"].__args__ introspection break → use typing.get_args(...model_fields[...].annotation)
  • TestModel calls all tools by default with junk args + MCP toolsets connect eagerly at agent.run() → notebook/mlflow tests stub build_*_mcp with FunctionToolset and pin TestModel(call_tools=[])
  • pydantic-graph requires -> RouteNode return annotations on BaseNode.run → added to all 5 sub-nodes; build_graph() injects classes into module globals so get_type_hints resolves the forward refs without runtime import cycles
  • _define_problem/_define_model/_define_solver used bare mlflow.log_artifact → silently misroutes when no module-level active run (the await graph.run(...) case). Switched to MlflowClient().log_artifact(state.mlflow_run_id, ...)

Known issues

  • test_ag_ui_app_is_asgi_callable is xfail — pydantic-ai 1.84 AGUIApp passes on_startup/on_shutdown to Starlette which removed those kwargs. Re-enable when upstream is fixed.

Test plan

  • uv run pytest tests/agents/48 passed, 1 xfailed
  • uv run ruff check src/marimo_flow/agents/ tests/agents/ examples/lab.py — clean
  • E2E smoke test (tests/agents/test_e2e.py) drives the full route→problem→route→model→route→solver→route→end loop with real MLflow file:// store and verifies all four artifact dirs (problem/, model/, solver/, agent_state/) land
  • Manual: marimo edit examples/lab.py with marimo MCP (marimo edit --mcp --no-token --port 2718 --headless) + mlflow ui running, ask the lead agent to solve a 1D Poisson problem
  • Manual: uv run python -m marimo_flow.agents.server.a2a — confirm A2A AgentCard at http://localhost:8000/.well-known/agent.json advertises the 5 sub-node skills
  • Manual: uv run python -m marimo_flow.agents.server.ag_ui — only after upstream pydantic-ai/Starlette fix

🤖 Generated with Claude Code

claude and others added 24 commits April 21, 2026 01:24
20-task plan for src/marimo_flow/agents/ — pydantic-graph orchestration,
MLflow persistence + tracing, Ollama Cloud LLMs, lead agent exposed via
marimo chat + A2A + AG-UI, sub-agents loaded with .claude/Skills/<name>/
SKILL.md as instructions=, MCP toolsets for marimo + mlflow servers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the empty package skeleton (src/marimo_flow/agents + nodes + server,
tests/agents + tests/agents/test_nodes) and pulls in the [a2a] extra
on pydantic-ai-slim (gives us FastA2A for the A2A server in Task 16).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
State holds only IDs and message history — live PINA/torch objects live
in FlowDeps.registry, keyed by their MLflow artifact URIs. This split
keeps the state JSON-serialisable so it can be persisted as MLflow
artifacts (see Task 6).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
DEFAULT_MODELS maps each role (route/notebook/problem/model/solver/
mlflow/lead) to its preferred Ollama tag — including ':cloud' tags
that Ollama proxies to Ollama Cloud, so a single OpenAI-compatible
endpoint covers both local and cloud LLMs without any extra proxy.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Pydantic-AI 1.84 renamed OpenAIModel to OpenAIChatModel; the old name
emits a DeprecationWarning that would propagate across every sub-node
that calls get_model(). Switch at the source. Also runs ruff format
to satisfy the 88-char line-length rule.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Loads .claude/Skills/<name>/SKILL.md (project-local, falls back to
~/.claude/skills/<name>/SKILL.md) so each sub-agent gets the same
domain knowledge a Claude Code session would. build_skill_instructions
returns a no-arg callable suitable for pydantic-ai's Agent(instructions=)
hook — lazy (per-run reload), supports multiple skills per role,
and stays out of message history (no token bloat).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
build_marimo_mcp() points at http://127.0.0.1:2718/mcp/server (the URL
in .vscode/mcp.json); build_mlflow_mcp() spawns `mlflow mcp run` over
stdio with MLFLOW_TRACKING_URI configured. build_mcp_servers() is a
generic transport-selector for ad-hoc use, lifted from the marimo-agent
RAG demo with the ollama-specific bits stripped.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Extends pydantic-graph's FullStatePersistence so graph snapshots also
land as JSON artifacts under the active MLflow run. Adds pytest-asyncio
to dev deps and asyncio_mode=auto so async graph tests work without
@pytest.mark.asyncio decorators on every test.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
RouteNode runs a tiny classifier agent (gemma4-fast by default) that
returns one of {notebook, problem, model, solver, mlflow, end} via a
Pydantic Literal output_type. The five sub-nodes land as stubs in this
task — they only set state.last_node and loop back to RouteNode. Real
implementations follow in Tasks 8-12.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
NotebookNode loads the marimo + marimo-pair skills as lazy instructions
and gets the marimo MCP toolset for cell ops. Tests stub the MCP builder
with an empty FunctionToolset so they do not need a live marimo server,
and pin TestModel with call_tools=[] to skip junk-arg auto-tool-calls.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…Problem)

ProblemNode owns a single tool define_problem(spec) — the agent designs
the PDE/BC/domain spec freely (informed by the pina skill loaded as
instructions). The spec lands as an MLflow artifact and the URI is
recorded in state.problem_artifact_uri. The live PINA Problem instance
is built lazily by SolverNode where it is actually needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ure)

ModelNode owns one tool define_model(spec) — the agent designs the
architecture (FNN/FNO/KAN/DeepONet or custom torch module) tailored to
the registered Problem. The spec is logged as an MLflow artifact and
recorded in state.model_artifact_uri. No fixed enum: the agent composes
the spec freely, informed by the pina skill loaded as instructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SolverNode owns define_solver(spec) — agent designs solver kind,
optimiser, scheduler, loss weights, epochs etc. freely based on the
registered Problem and Model. mlflow.pytorch.autolog (enabled by the
lead agent in Task 14) will capture training metrics+checkpoints when
pina.Trainer.fit() runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MLflowNode loads the mlflow skill as instructions and gets the mlflow
MCP toolset (mlflow mcp run over stdio). Test stubs the MCP builder
with an empty FunctionToolset so it does not need a live mlflow MCP
process, and pins TestModel with call_tools=[] to skip junk-arg auto
tool-calls — same pattern as NotebookNode tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_graph() returns the Graph[FlowState, FlowDeps, str] with all six
nodes wired in; start_node() is the convenience constructor for the
RouteNode entry point. Mermaid rendering is verified by an explicit
test so the lab notebook can render the live diagram.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_lead_agent() returns a single Pydantic-AI Agent with one tool
run_pina_workflow(intent) that drives the graph end-to-end. Enables
mlflow.pydantic_ai.autolog() and mlflow.pytorch.autolog() once
(idempotent module flag) so every nested sub-agent call and PINA
training run lands as nested traces under the active MLflow run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lead_chat() returns the async-generator callable mo.ui.chat expects:
(messages, config) -> AsyncIterator[str]. Bridges to the lead agent's
run_stream() and yields *deltas* (not accumulated text) — bandwidth
scales with new tokens, not total length.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_a2a_app() exposes the lead agent over the Agent2Agent protocol
(uvicorn-runnable) with a node_skills() AgentCard that advertises one
capability per sub-node role (define_problem, define_architecture,
define_solver, query_mlflow, edit_notebook). build_ag_ui_app() exposes
the same agent over the AG-UI protocol for non-marimo frontends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-export the public surface (lead_chat, build_lead_agent, FlowState,
FlowDeps, get_model, build_graph, start_node, MLflowStatePersistence,
DEFAULT_MODELS) so notebook authors can `from marimo_flow.agents import ...`
without reaching into submodules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lab notebook now demonstrates the multi-agent team end-to-end:
config UI for Ollama base_url + MLflow URI + marimo MCP URL, builds
deps + graph + chat-adapter, exposes mo.ui.chat(lead_chat) for the
streaming conversation, and renders the live mermaid diagram of the
graph for visual inspection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drives the graph route->problem->route->model->route->solver->route->end
with deterministic TestModel decisions and a real MLflow file:// store.
Verifies the run reaches End and that problem/, model/, solver/, and
agent_state/ artifacts all land under the active MLflow run.

Two supporting fixes in src/ unblock build_graph() at runtime:

  * Each specialist node's `run` now declares `-> RouteNode` (forward ref);
    pydantic_graph rejected nodes that were missing a return-type hint.
  * `build_graph()` injects RouteNode and the five specialists into each
    node module's `__dict__` so `get_type_hints(cls.run)` can resolve the
    forward refs without introducing runtime import cycles. The
    `TYPE_CHECKING` imports are kept for static type checkers.

The test stubs `_define_problem` / `_define_model` / `_define_solver`
to log artifacts via `MlflowClient().log_artifact(run_id, ...)` because
no MLflow run is module-level-active inside `await graph.run(...)`, so
the production helpers' bare `mlflow.log_artifact` would silently miss
the test's `tmp_mlflow` run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-UI bug

Three review-driven fixes:

1. _define_problem/_define_model/_define_solver previously used the
   bare module-level mlflow.log_artifact, which silently misroutes
   artifacts when there is no module-level active run (the case
   inside `await graph.run(...)` since the lead agent's start_run
   context is suspended across awaits). Switch to MlflowClient with
   the explicit state.mlflow_run_id, mirroring the persistence layer.

2. test_graph_contains_all_six_nodes now compares string keys (the
   actual type returned by Graph.node_defs.keys()) instead of class
   objects.

3. test_ag_ui_app_is_asgi_callable marked xfail — pydantic-ai 1.84
   AGUIApp passes on_startup/on_shutdown to Starlette, which removed
   those kwargs. Add the [ag-ui] extra so the test reaches the failure
   point (not just an ImportError); re-enable once upstream is fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…AUDE.md)

README gets a new section explaining the team, role/skill mapping,
default models, and the three exposure surfaces (marimo chat, A2A,
AG-UI). CHANGELOG lists the agents module under [Unreleased] with
the OpenAIChatModel migration and the MlflowClient artifact-routing
fix called out. CLAUDE.md gets an Agents section so future Claude
sessions know to monkeypatch _define_* functions rather than
fighting TestModel auto-arg generation, and to stub MCP builders
with FunctionToolset in node tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bjoernbethge bjoernbethge merged commit d6c9b7c into main Apr 21, 2026
1 of 2 checks passed
@bjoernbethge bjoernbethge deleted the feat/agents-pina-team branch April 21, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants