Conversation
Implement human-in-the-loop interrupt/resume across single-agent, multi-agent, CLI, and Streamlit UI. The ask_human tool is controlled by a human_supervised parameter (default True) for backward compatibility. Changes by file: - src/chemgraph/tools/generic_tools.py: Add ask_human tool using langgraph interrupt() to pause graph execution and return the human's response as a string. Supports both plain string and dict responses. - src/chemgraph/prompt/single_agent_prompt.py: Extract the ask_human instruction block into _ASK_HUMAN_PROMPT_BLOCK constant. Add get_single_agent_prompt(human_supervised) helper that strips the block when human_supervised=False so the LLM is not instructed to call an unavailable tool. - src/chemgraph/graphs/single_agent.py: Add human_supervised parameter to ChemGraphAgent() and construct_single_agent_graph(). When True (default), ask_human is included in default tools and force-added to custom tool lists. When False, ask_human is excluded entirely. - src/chemgraph/state/multi_agent_state.py: Extend PlannerState to accept "ask_human" as a next_step literal and add optional clarification field for the question text. - src/chemgraph/schemas/multi_agent_response.py: Extend PlannerResponse with "ask_human" next_step, clarification field, and legacy "question" key normalization in the model validator. - src/chemgraph/prompt/multi_agent_prompt.py: Add Phase 1b (Ask Human for Clarification) instructions and JSON example to the planner prompt. Update Phase 2 to mention human clarification responses. - src/chemgraph/graphs/multi_agent.py: Add human_review_node using interrupt() for human-in-the-loop. Route ask_human next_step to human_review via unified_planner_router. Wire human_review -> Planner edge in graph construction. Pass clarification from planner_agent output. - src/chemgraph/agent/llm_agent.py: Add human_supervised and human_input_handler parameters to ChemGraph.__init__(). Add HumanInputRequired exception class. Implement _call_human_input_handler and _stream_until_interrupt for interrupt detection and human-in-the-loop resume loop in run(). Strip ask_human prompt when human_supervised=False. - src/chemgraph/cli/commands.py: Handle HumanInputRequired in run_query(): stop spinner, show Rich panel with question, prompt user with Prompt.ask(), resume graph with Command(resume=answer). Loop until graph completes. - src/ui/app.py: Handle HumanInputRequired in Streamlit UI: store interrupt in session_state, show warning with question, resume with Command(resume=answer) on next submit. - tests/test_human_interrupt.py: Add comprehensive test suite: PlannerResponse ask_human schema tests, planner_agent routing tests, unified_planner_router tests, human_review_node interrupt tests, ask_human tool tests, graph construction tests for both human_supervised=True and False, and get_single_agent_prompt helper tests.
Establish a consistent architecture across all tool domains where pure-Python logic lives in *_core.py modules and LangChain @tool / MCP @mcp.tool wrappers are thin one-liner delegates. Phase 1 - Rename and relocate: - Rename tools/core.py -> tools/ase_core.py for clarity - Move MACE schemas from parsl_tools.py to schemas/mace_parsl_schema.py - Add missing __init__.py in mcp/ and schemas/calculators/ Phase 2 - Extract core modules: - Create cheminformatics_core.py (deduplicate RDKit/PubChem logic from 3 files into single smiles_to_3d implementation) - Create xanes_core.py (extract from xanes_tools.py, delete redundant xanes_mcp.py by merging into xanes_mcp_parsl.py) - Create graspa_core.py (extract from graspa_tools.py) Phase 3 - Consolidate shared utilities: - Add extract_output_json_core to ase_core.py (was duplicated 3x) - Consolidate load_parsl_config into mcp/server_utils.py (was 2x) - Replace hardcoded element_map dicts in report_tools.py with ase.data.chemical_symbols (handles all elements, not just 1-18) Phase 4 - Cleanup: - Delete mcp_helper.py shim, update all consumers - Refactor new_eval MCP example scripts from ~486 to ~88 lines each - Fix stale import in utils/tool_call_eval.py Net result: ~2769 lines removed, ~2783 added (new core modules), with significant deduplication. All 322 existing tests pass.
- Replace custom HTML bubbles with st.chat_message and st.chat_input - Add HumanInputRequired interrupt handling with resume flow - Stream tool calls live via st.status (compact display with checkmarks) - Fix broken ase_core import in visualization.py - Fix duplicate widget keys for multi-query structure rendering - Isolate per-query messages to prevent stale structure display - Remove outdated Quick Help footer
- Change human_supervised default from True to False across agent, graph, CLI, and UI layers so ask_human is opt-in - Add --human-supervised CLI flag and UI checkbox to enable it - Thread human_supervised through initialize_agent, interactive_mode, and agent_manager - Add human_supervised to config defaults (CLI and UI loaders) - Refactor resume loop in run_query to use astream instead of ainvoke so tool-call output is printed during human-in-the-loop resume - Update test to explicitly pass human_supervised=True
Resolve conflict in src/ui/_pages/main_interface.py: keep branding import from dev, drop unused run_async_callable import.
feat: add human-in-the-loop + refactor tools into core/wrapper pattern
Sync k8s deployment with docker-compose.yml changes from 23433d0. Add both variables as optional secret-backed env vars in the deployment, secrets template, and documentation.
Trigger Docker image build on feature_k8s pushes, tagging as ghcr.io/argonne-lcf/chemgraph:feature-k8s. Update deployment.yaml to use the branch-specific image tag. The :latest tag is now reserved for version tag (v*) releases only.
Fix Docker build failure caused by missing gfortran symbol (_gfortran_concat_string) in tblite 0.5.0.
Hermes only needs amd64. This eliminates the slow arm64 build that was blocking the manifest creation step.
The metadata action keeps the branch name as-is, so the tag is feature_k8s not feature-k8s. Update the workflow inspect step and deployment.yaml to match.
…ation
Add --mcp-url and --mcp-command CLI arguments so users can connect to
MCP servers directly from the CLI without writing custom scripts:
chemgraph run -q "..." --mcp-url http://localhost:9003/mcp/
chemgraph run -q "..." --mcp-command "python -m chemgraph.mcp.mcp_tools"
MCP tools are loaded before agent initialization and passed to any
workflow (single_agent, multi_agent, etc.) via the existing tools
parameter.
Also fix a bug where MCP tool messages returning content as a list of
content blocks (e.g. [{'type': 'text', 'text': '...'}]) would cause
a Pydantic validation error when saving to the session store.
Add dedicated deployment and service manifests for the ChemGraph MCP server on port 9003, using streamable HTTP transport. Shares the same image, secrets, and proxy config as the Streamlit deployment.
Deploy, manage, and tear down both chemgraph-streamlit and chemgraph-mcp from a single script. Adds service-specific targets for logs and port-forward commands. Replaces cluster-info connection check with namespace-level check to support RBAC-limited users.
Add dev to ghcr-publish.yml branch triggers so merging to dev builds a Docker image automatically. Update K8s manifests to use the dev image tag. Add IMAGE_TAG env var to deploy.sh for overriding the image tag at deploy time.
Add Kubernetes deployment support. Fully tested/deployed on ALCF Hermes.
Collaborator
Author
|
Keep this open to document ChemGraph dev changes. |
IdealGasThermo previously hardcoded spin=0 regardless of the calculator's spin state, silently ignoring user-configured multiplicity (e.g. UMA's spin field on FAIRChemCalc). Each calculator schema also exposed spin inconsistently or not at all. - FAIRChemCalc: rename "spin" -> "multiplicity" (semantics were already multiplicity, just misnamed). "spin=" is still accepted as a deprecated alias via a model_validator. atoms.info["spin"] is still emitted because UMA reads it from there. - TBLiteCalc, OrcaCalc: already had "multiplicity"; add get_multiplicity(). - NWChemCalc: add "multiplicity" and "charge", injected into the theory block (dft/mp2/ccsd/tce/tddft) as "mult"/"charge". - Psi4Calc: add "multiplicity" and "charge", passed through to ASE Psi4. - ase_tools and mcp_tools thermo blocks: derive S = (M-1)/2 from calc_model.get_multiplicity() (defaults to 1 = singlet) and pass to IdealGasThermo. parsl_tools is MACE-only, so spin=0 stays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Limit default pytest discovery to the tests/ directory so legacy evaluation scripts are not collected during normal test runs. - Declare deepdiff in pyproject.toml because chemgraph.utils.tool_call_eval imports DeepDiff at runtime.
Contributor
Release ChemGraph v0.5.0This PR merges the current Highlights
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.