feat(agents): PINA multi-agent team with pydantic-graph + MLflow + A2A/AG-UI by bjoernbethge · Pull Request #33 · synapticore-io/marimo-flow

bjoernbethge · 2026-04-21T11:49:46Z

Summary

New marimo_flow.agents package: multi-agent PINA team orchestrated by pydantic-graph, persisted + traced via MLflow, exposed via marimo chat + A2A + AG-UI
RouteNode classifier dispatches between five specialists (Notebook, Problem, Model, Solver, MLflow); each loads its .claude/Skills/<name>/SKILL.md as lazy instructions= (no message-history bloat)
Lead agent (build_lead_agent) wraps the graph as a single tool — same backend powers mo.ui.chat (lead_chat), agent.to_a2a() (with AgentCard node_skills), and agent.to_ag_ui()
Models: Ollama Cloud (:cloud tags) via OpenAIChatModel — single http://localhost:11434/v1 endpoint covers local + cloud
State (FlowState) is JSON-serialisable + persisted as MLflow artifacts (MLflowStatePersistence); live PINA/torch objects live in FlowDeps.registry keyed by URI
examples/lab.py rewritten as full chat demo with state inspector and live mermaid diagram

Plan & Tasks

Followed docs/superpowers/plans/2026-04-21-marimo-flow-pina-team.md — all 20 tasks complete.

Plan-Bugs caught & fixed mid-execution

OpenAIModel deprecated in pydantic-ai 1.84 → switched to OpenAIChatModel
SimpleStatePersistence.load_all raises NotImplementedError → use FullStatePersistence instead
from __future__ import annotations made the planned __annotations__["next_node"].__args__ introspection break → use typing.get_args(...model_fields[...].annotation)
TestModel calls all tools by default with junk args + MCP toolsets connect eagerly at agent.run() → notebook/mlflow tests stub build_*_mcp with FunctionToolset and pin TestModel(call_tools=[])
pydantic-graph requires -> RouteNode return annotations on BaseNode.run → added to all 5 sub-nodes; build_graph() injects classes into module globals so get_type_hints resolves the forward refs without runtime import cycles
_define_problem/_define_model/_define_solver used bare mlflow.log_artifact → silently misroutes when no module-level active run (the await graph.run(...) case). Switched to MlflowClient().log_artifact(state.mlflow_run_id, ...)

Known issues

test_ag_ui_app_is_asgi_callable is xfail — pydantic-ai 1.84 AGUIApp passes on_startup/on_shutdown to Starlette which removed those kwargs. Re-enable when upstream is fixed.

Test plan

uv run pytest tests/agents/ — 48 passed, 1 xfailed
uv run ruff check src/marimo_flow/agents/ tests/agents/ examples/lab.py — clean
E2E smoke test (tests/agents/test_e2e.py) drives the full route→problem→route→model→route→solver→route→end loop with real MLflow file:// store and verifies all four artifact dirs (problem/, model/, solver/, agent_state/) land
Manual: marimo edit examples/lab.py with marimo MCP (marimo edit --mcp --no-token --port 2718 --headless) + mlflow ui running, ask the lead agent to solve a 1D Poisson problem
Manual: uv run python -m marimo_flow.agents.server.a2a — confirm A2A AgentCard at http://localhost:8000/.well-known/agent.json advertises the 5 sub-node skills
Manual: uv run python -m marimo_flow.agents.server.ag_ui — only after upstream pydantic-ai/Starlette fix

🤖 Generated with Claude Code

20-task plan for src/marimo_flow/agents/ — pydantic-graph orchestration, MLflow persistence + tracing, Ollama Cloud LLMs, lead agent exposed via marimo chat + A2A + AG-UI, sub-agents loaded with .claude/Skills/<name>/ SKILL.md as instructions=, MCP toolsets for marimo + mlflow servers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the empty package skeleton (src/marimo_flow/agents + nodes + server, tests/agents + tests/agents/test_nodes) and pulls in the [a2a] extra on pydantic-ai-slim (gives us FastA2A for the A2A server in Task 16). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

State holds only IDs and message history — live PINA/torch objects live in FlowDeps.registry, keyed by their MLflow artifact URIs. This split keeps the state JSON-serialisable so it can be persisted as MLflow artifacts (see Task 6). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

DEFAULT_MODELS maps each role (route/notebook/problem/model/solver/ mlflow/lead) to its preferred Ollama tag — including ':cloud' tags that Ollama proxies to Ollama Cloud, so a single OpenAI-compatible endpoint covers both local and cloud LLMs without any extra proxy. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Pydantic-AI 1.84 renamed OpenAIModel to OpenAIChatModel; the old name emits a DeprecationWarning that would propagate across every sub-node that calls get_model(). Switch at the source. Also runs ruff format to satisfy the 88-char line-length rule. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Loads .claude/Skills/<name>/SKILL.md (project-local, falls back to ~/.claude/skills/<name>/SKILL.md) so each sub-agent gets the same domain knowledge a Claude Code session would. build_skill_instructions returns a no-arg callable suitable for pydantic-ai's Agent(instructions=) hook — lazy (per-run reload), supports multiple skills per role, and stays out of message history (no token bloat). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

build_marimo_mcp() points at http://127.0.0.1:2718/mcp/server (the URL in .vscode/mcp.json); build_mlflow_mcp() spawns `mlflow mcp run` over stdio with MLFLOW_TRACKING_URI configured. build_mcp_servers() is a generic transport-selector for ad-hoc use, lifted from the marimo-agent RAG demo with the ollama-specific bits stripped. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Extends pydantic-graph's FullStatePersistence so graph snapshots also land as JSON artifacts under the active MLflow run. Adds pytest-asyncio to dev deps and asyncio_mode=auto so async graph tests work without @pytest.mark.asyncio decorators on every test. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

RouteNode runs a tiny classifier agent (gemma4-fast by default) that returns one of {notebook, problem, model, solver, mlflow, end} via a Pydantic Literal output_type. The five sub-nodes land as stubs in this task — they only set state.last_node and loop back to RouteNode. Real implementations follow in Tasks 8-12. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

NotebookNode loads the marimo + marimo-pair skills as lazy instructions and gets the marimo MCP toolset for cell ops. Tests stub the MCP builder with an empty FunctionToolset so they do not need a live marimo server, and pin TestModel with call_tools=[] to skip junk-arg auto-tool-calls. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…Problem) ProblemNode owns a single tool define_problem(spec) — the agent designs the PDE/BC/domain spec freely (informed by the pina skill loaded as instructions). The spec lands as an MLflow artifact and the URI is recorded in state.problem_artifact_uri. The live PINA Problem instance is built lazily by SolverNode where it is actually needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ure) ModelNode owns one tool define_model(spec) — the agent designs the architecture (FNN/FNO/KAN/DeepONet or custom torch module) tailored to the registered Problem. The spec is logged as an MLflow artifact and recorded in state.model_artifact_uri. No fixed enum: the agent composes the spec freely, informed by the pina skill loaded as instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SolverNode owns define_solver(spec) — agent designs solver kind, optimiser, scheduler, loss weights, epochs etc. freely based on the registered Problem and Model. mlflow.pytorch.autolog (enabled by the lead agent in Task 14) will capture training metrics+checkpoints when pina.Trainer.fit() runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MLflowNode loads the mlflow skill as instructions and gets the mlflow MCP toolset (mlflow mcp run over stdio). Test stubs the MCP builder with an empty FunctionToolset so it does not need a live mlflow MCP process, and pins TestModel with call_tools=[] to skip junk-arg auto tool-calls — same pattern as NotebookNode tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

build_graph() returns the Graph[FlowState, FlowDeps, str] with all six nodes wired in; start_node() is the convenience constructor for the RouteNode entry point. Mermaid rendering is verified by an explicit test so the lab notebook can render the live diagram. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

build_lead_agent() returns a single Pydantic-AI Agent with one tool run_pina_workflow(intent) that drives the graph end-to-end. Enables mlflow.pydantic_ai.autolog() and mlflow.pytorch.autolog() once (idempotent module flag) so every nested sub-agent call and PINA training run lands as nested traces under the active MLflow run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lead_chat() returns the async-generator callable mo.ui.chat expects: (messages, config) -> AsyncIterator[str]. Bridges to the lead agent's run_stream() and yields *deltas* (not accumulated text) — bandwidth scales with new tokens, not total length. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

build_a2a_app() exposes the lead agent over the Agent2Agent protocol (uvicorn-runnable) with a node_skills() AgentCard that advertises one capability per sub-node role (define_problem, define_architecture, define_solver, query_mlflow, edit_notebook). build_ag_ui_app() exposes the same agent over the AG-UI protocol for non-marimo frontends. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Re-export the public surface (lead_chat, build_lead_agent, FlowState, FlowDeps, get_model, build_graph, start_node, MLflowStatePersistence, DEFAULT_MODELS) so notebook authors can `from marimo_flow.agents import ...` without reaching into submodules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The lab notebook now demonstrates the multi-agent team end-to-end: config UI for Ollama base_url + MLflow URI + marimo MCP URL, builds deps + graph + chat-adapter, exposes mo.ui.chat(lead_chat) for the streaming conversation, and renders the live mermaid diagram of the graph for visual inspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drives the graph route->problem->route->model->route->solver->route->end with deterministic TestModel decisions and a real MLflow file:// store. Verifies the run reaches End and that problem/, model/, solver/, and agent_state/ artifacts all land under the active MLflow run. Two supporting fixes in src/ unblock build_graph() at runtime: * Each specialist node's `run` now declares `-> RouteNode` (forward ref); pydantic_graph rejected nodes that were missing a return-type hint. * `build_graph()` injects RouteNode and the five specialists into each node module's `__dict__` so `get_type_hints(cls.run)` can resolve the forward refs without introducing runtime import cycles. The `TYPE_CHECKING` imports are kept for static type checkers. The test stubs `_define_problem` / `_define_model` / `_define_solver` to log artifacts via `MlflowClient().log_artifact(run_id, ...)` because no MLflow run is module-level-active inside `await graph.run(...)`, so the production helpers' bare `mlflow.log_artifact` would silently miss the test's `tmp_mlflow` run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-UI bug Three review-driven fixes: 1. _define_problem/_define_model/_define_solver previously used the bare module-level mlflow.log_artifact, which silently misroutes artifacts when there is no module-level active run (the case inside `await graph.run(...)` since the lead agent's start_run context is suspended across awaits). Switch to MlflowClient with the explicit state.mlflow_run_id, mirroring the persistence layer. 2. test_graph_contains_all_six_nodes now compares string keys (the actual type returned by Graph.node_defs.keys()) instead of class objects. 3. test_ag_ui_app_is_asgi_callable marked xfail — pydantic-ai 1.84 AGUIApp passes on_startup/on_shutdown to Starlette, which removed those kwargs. Add the [ag-ui] extra so the test reaches the failure point (not just an ImportError); re-enable once upstream is fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…AUDE.md) README gets a new section explaining the team, role/skill mapping, default models, and the three exposure surfaces (marimo chat, A2A, AG-UI). CHANGELOG lists the agents module under [Unreleased] with the OpenAIChatModel migration and the MlflowClient artifact-routing fix called out. CLAUDE.md gets an Agents section so future Claude sessions know to monkeypatch _define_* functions rather than fighting TestModel auto-arg generation, and to stub MCP builders with FunctionToolset in node tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude and others added 24 commits April 21, 2026 01:24

style(agents): apply ruff isort to test_state.py

df0151e

bjoernbethge merged commit d6c9b7c into main Apr 21, 2026
1 of 2 checks passed

bjoernbethge deleted the feat/agents-pina-team branch April 21, 2026 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agents): PINA multi-agent team with pydantic-graph + MLflow + A2A/AG-UI#33

feat(agents): PINA multi-agent team with pydantic-graph + MLflow + A2A/AG-UI#33
bjoernbethge merged 24 commits into
mainfrom
feat/agents-pina-team

bjoernbethge commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bjoernbethge commented Apr 21, 2026

Summary

Plan & Tasks

Plan-Bugs caught & fixed mid-execution

Known issues

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants