feat(agents): PINA multi-agent team with pydantic-graph + MLflow + A2A/AG-UI#33
Merged
Conversation
20-task plan for src/marimo_flow/agents/ — pydantic-graph orchestration, MLflow persistence + tracing, Ollama Cloud LLMs, lead agent exposed via marimo chat + A2A + AG-UI, sub-agents loaded with .claude/Skills/<name>/ SKILL.md as instructions=, MCP toolsets for marimo + mlflow servers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the empty package skeleton (src/marimo_flow/agents + nodes + server, tests/agents + tests/agents/test_nodes) and pulls in the [a2a] extra on pydantic-ai-slim (gives us FastA2A for the A2A server in Task 16). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
State holds only IDs and message history — live PINA/torch objects live in FlowDeps.registry, keyed by their MLflow artifact URIs. This split keeps the state JSON-serialisable so it can be persisted as MLflow artifacts (see Task 6). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
DEFAULT_MODELS maps each role (route/notebook/problem/model/solver/ mlflow/lead) to its preferred Ollama tag — including ':cloud' tags that Ollama proxies to Ollama Cloud, so a single OpenAI-compatible endpoint covers both local and cloud LLMs without any extra proxy. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Pydantic-AI 1.84 renamed OpenAIModel to OpenAIChatModel; the old name emits a DeprecationWarning that would propagate across every sub-node that calls get_model(). Switch at the source. Also runs ruff format to satisfy the 88-char line-length rule. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Loads .claude/Skills/<name>/SKILL.md (project-local, falls back to ~/.claude/skills/<name>/SKILL.md) so each sub-agent gets the same domain knowledge a Claude Code session would. build_skill_instructions returns a no-arg callable suitable for pydantic-ai's Agent(instructions=) hook — lazy (per-run reload), supports multiple skills per role, and stays out of message history (no token bloat). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
build_marimo_mcp() points at http://127.0.0.1:2718/mcp/server (the URL in .vscode/mcp.json); build_mlflow_mcp() spawns `mlflow mcp run` over stdio with MLFLOW_TRACKING_URI configured. build_mcp_servers() is a generic transport-selector for ad-hoc use, lifted from the marimo-agent RAG demo with the ollama-specific bits stripped. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Extends pydantic-graph's FullStatePersistence so graph snapshots also land as JSON artifacts under the active MLflow run. Adds pytest-asyncio to dev deps and asyncio_mode=auto so async graph tests work without @pytest.mark.asyncio decorators on every test. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
RouteNode runs a tiny classifier agent (gemma4-fast by default) that
returns one of {notebook, problem, model, solver, mlflow, end} via a
Pydantic Literal output_type. The five sub-nodes land as stubs in this
task — they only set state.last_node and loop back to RouteNode. Real
implementations follow in Tasks 8-12.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
NotebookNode loads the marimo + marimo-pair skills as lazy instructions and gets the marimo MCP toolset for cell ops. Tests stub the MCP builder with an empty FunctionToolset so they do not need a live marimo server, and pin TestModel with call_tools=[] to skip junk-arg auto-tool-calls. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…Problem) ProblemNode owns a single tool define_problem(spec) — the agent designs the PDE/BC/domain spec freely (informed by the pina skill loaded as instructions). The spec lands as an MLflow artifact and the URI is recorded in state.problem_artifact_uri. The live PINA Problem instance is built lazily by SolverNode where it is actually needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ure) ModelNode owns one tool define_model(spec) — the agent designs the architecture (FNN/FNO/KAN/DeepONet or custom torch module) tailored to the registered Problem. The spec is logged as an MLflow artifact and recorded in state.model_artifact_uri. No fixed enum: the agent composes the spec freely, informed by the pina skill loaded as instructions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SolverNode owns define_solver(spec) — agent designs solver kind, optimiser, scheduler, loss weights, epochs etc. freely based on the registered Problem and Model. mlflow.pytorch.autolog (enabled by the lead agent in Task 14) will capture training metrics+checkpoints when pina.Trainer.fit() runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MLflowNode loads the mlflow skill as instructions and gets the mlflow MCP toolset (mlflow mcp run over stdio). Test stubs the MCP builder with an empty FunctionToolset so it does not need a live mlflow MCP process, and pins TestModel with call_tools=[] to skip junk-arg auto tool-calls — same pattern as NotebookNode tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_graph() returns the Graph[FlowState, FlowDeps, str] with all six nodes wired in; start_node() is the convenience constructor for the RouteNode entry point. Mermaid rendering is verified by an explicit test so the lab notebook can render the live diagram. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_lead_agent() returns a single Pydantic-AI Agent with one tool run_pina_workflow(intent) that drives the graph end-to-end. Enables mlflow.pydantic_ai.autolog() and mlflow.pytorch.autolog() once (idempotent module flag) so every nested sub-agent call and PINA training run lands as nested traces under the active MLflow run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lead_chat() returns the async-generator callable mo.ui.chat expects: (messages, config) -> AsyncIterator[str]. Bridges to the lead agent's run_stream() and yields *deltas* (not accumulated text) — bandwidth scales with new tokens, not total length. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build_a2a_app() exposes the lead agent over the Agent2Agent protocol (uvicorn-runnable) with a node_skills() AgentCard that advertises one capability per sub-node role (define_problem, define_architecture, define_solver, query_mlflow, edit_notebook). build_ag_ui_app() exposes the same agent over the AG-UI protocol for non-marimo frontends. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-export the public surface (lead_chat, build_lead_agent, FlowState, FlowDeps, get_model, build_graph, start_node, MLflowStatePersistence, DEFAULT_MODELS) so notebook authors can `from marimo_flow.agents import ...` without reaching into submodules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lab notebook now demonstrates the multi-agent team end-to-end: config UI for Ollama base_url + MLflow URI + marimo MCP URL, builds deps + graph + chat-adapter, exposes mo.ui.chat(lead_chat) for the streaming conversation, and renders the live mermaid diagram of the graph for visual inspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drives the graph route->problem->route->model->route->solver->route->end
with deterministic TestModel decisions and a real MLflow file:// store.
Verifies the run reaches End and that problem/, model/, solver/, and
agent_state/ artifacts all land under the active MLflow run.
Two supporting fixes in src/ unblock build_graph() at runtime:
* Each specialist node's `run` now declares `-> RouteNode` (forward ref);
pydantic_graph rejected nodes that were missing a return-type hint.
* `build_graph()` injects RouteNode and the five specialists into each
node module's `__dict__` so `get_type_hints(cls.run)` can resolve the
forward refs without introducing runtime import cycles. The
`TYPE_CHECKING` imports are kept for static type checkers.
The test stubs `_define_problem` / `_define_model` / `_define_solver`
to log artifacts via `MlflowClient().log_artifact(run_id, ...)` because
no MLflow run is module-level-active inside `await graph.run(...)`, so
the production helpers' bare `mlflow.log_artifact` would silently miss
the test's `tmp_mlflow` run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-UI bug Three review-driven fixes: 1. _define_problem/_define_model/_define_solver previously used the bare module-level mlflow.log_artifact, which silently misroutes artifacts when there is no module-level active run (the case inside `await graph.run(...)` since the lead agent's start_run context is suspended across awaits). Switch to MlflowClient with the explicit state.mlflow_run_id, mirroring the persistence layer. 2. test_graph_contains_all_six_nodes now compares string keys (the actual type returned by Graph.node_defs.keys()) instead of class objects. 3. test_ag_ui_app_is_asgi_callable marked xfail — pydantic-ai 1.84 AGUIApp passes on_startup/on_shutdown to Starlette, which removed those kwargs. Add the [ag-ui] extra so the test reaches the failure point (not just an ImportError); re-enable once upstream is fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…AUDE.md) README gets a new section explaining the team, role/skill mapping, default models, and the three exposure surfaces (marimo chat, A2A, AG-UI). CHANGELOG lists the agents module under [Unreleased] with the OpenAIChatModel migration and the MlflowClient artifact-routing fix called out. CLAUDE.md gets an Agents section so future Claude sessions know to monkeypatch _define_* functions rather than fighting TestModel auto-arg generation, and to stub MCP builders with FunctionToolset in node tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
marimo_flow.agentspackage: multi-agent PINA team orchestrated bypydantic-graph, persisted + traced via MLflow, exposed via marimo chat + A2A + AG-UIRouteNodeclassifier dispatches between five specialists (Notebook,Problem,Model,Solver,MLflow); each loads its.claude/Skills/<name>/SKILL.mdas lazyinstructions=(no message-history bloat)build_lead_agent) wraps the graph as a single tool — same backend powersmo.ui.chat(lead_chat),agent.to_a2a()(with AgentCardnode_skills), andagent.to_ag_ui():cloudtags) viaOpenAIChatModel— singlehttp://localhost:11434/v1endpoint covers local + cloudFlowState) is JSON-serialisable + persisted as MLflow artifacts (MLflowStatePersistence); live PINA/torch objects live inFlowDeps.registrykeyed by URIexamples/lab.pyrewritten as full chat demo with state inspector and live mermaid diagramPlan & Tasks
Followed
docs/superpowers/plans/2026-04-21-marimo-flow-pina-team.md— all 20 tasks complete.Plan-Bugs caught & fixed mid-execution
OpenAIModeldeprecated in pydantic-ai 1.84 → switched toOpenAIChatModelSimpleStatePersistence.load_allraisesNotImplementedError→ useFullStatePersistenceinsteadfrom __future__ import annotationsmade the planned__annotations__["next_node"].__args__introspection break → usetyping.get_args(...model_fields[...].annotation)TestModelcalls all tools by default with junk args + MCP toolsets connect eagerly atagent.run()→ notebook/mlflow tests stubbuild_*_mcpwithFunctionToolsetand pinTestModel(call_tools=[])pydantic-graphrequires-> RouteNodereturn annotations onBaseNode.run→ added to all 5 sub-nodes;build_graph()injects classes into module globals soget_type_hintsresolves the forward refs without runtime import cycles_define_problem/_define_model/_define_solverused baremlflow.log_artifact→ silently misroutes when no module-level active run (theawait graph.run(...)case). Switched toMlflowClient().log_artifact(state.mlflow_run_id, ...)Known issues
test_ag_ui_app_is_asgi_callableisxfail— pydantic-ai 1.84AGUIApppasseson_startup/on_shutdownto Starlette which removed those kwargs. Re-enable when upstream is fixed.Test plan
uv run pytest tests/agents/— 48 passed, 1 xfaileduv run ruff check src/marimo_flow/agents/ tests/agents/ examples/lab.py— cleantests/agents/test_e2e.py) drives the full route→problem→route→model→route→solver→route→end loop with real MLflow file:// store and verifies all four artifact dirs (problem/,model/,solver/,agent_state/) landmarimo edit examples/lab.pywith marimo MCP (marimo edit --mcp --no-token --port 2718 --headless) +mlflow uirunning, ask the lead agent to solve a 1D Poisson problemuv run python -m marimo_flow.agents.server.a2a— confirm A2A AgentCard athttp://localhost:8000/.well-known/agent.jsonadvertises the 5 sub-node skillsuv run python -m marimo_flow.agents.server.ag_ui— only after upstream pydantic-ai/Starlette fix🤖 Generated with Claude Code