Skip to content

Merge dev to main#124

Merged
keceli merged 147 commits into
mainfrom
dev
Jun 2, 2026
Merged

Merge dev to main#124
keceli merged 147 commits into
mainfrom
dev

Conversation

@tdpham2

@tdpham2 tdpham2 commented May 7, 2026

Copy link
Copy Markdown
Collaborator

keceli and others added 30 commits February 13, 2026 17:21
Implement human-in-the-loop interrupt/resume across single-agent,
multi-agent, CLI, and Streamlit UI. The ask_human tool is controlled
by a human_supervised parameter (default True) for backward
compatibility.

Changes by file:

- src/chemgraph/tools/generic_tools.py:
  Add ask_human tool using langgraph interrupt() to pause graph
  execution and return the human's response as a string. Supports
  both plain string and dict responses.

- src/chemgraph/prompt/single_agent_prompt.py:
  Extract the ask_human instruction block into _ASK_HUMAN_PROMPT_BLOCK
  constant. Add get_single_agent_prompt(human_supervised) helper that
  strips the block when human_supervised=False so the LLM is not
  instructed to call an unavailable tool.

- src/chemgraph/graphs/single_agent.py:
  Add human_supervised parameter to ChemGraphAgent() and
  construct_single_agent_graph(). When True (default), ask_human is
  included in default tools and force-added to custom tool lists.
  When False, ask_human is excluded entirely.

- src/chemgraph/state/multi_agent_state.py:
  Extend PlannerState to accept "ask_human" as a next_step literal
  and add optional clarification field for the question text.

- src/chemgraph/schemas/multi_agent_response.py:
  Extend PlannerResponse with "ask_human" next_step, clarification
  field, and legacy "question" key normalization in the model
  validator.

- src/chemgraph/prompt/multi_agent_prompt.py:
  Add Phase 1b (Ask Human for Clarification) instructions and JSON
  example to the planner prompt. Update Phase 2 to mention human
  clarification responses.

- src/chemgraph/graphs/multi_agent.py:
  Add human_review_node using interrupt() for human-in-the-loop.
  Route ask_human next_step to human_review via unified_planner_router.
  Wire human_review -> Planner edge in graph construction. Pass
  clarification from planner_agent output.

- src/chemgraph/agent/llm_agent.py:
  Add human_supervised and human_input_handler parameters to
  ChemGraph.__init__(). Add HumanInputRequired exception class.
  Implement _call_human_input_handler and _stream_until_interrupt
  for interrupt detection and human-in-the-loop resume loop in
  run(). Strip ask_human prompt when human_supervised=False.

- src/chemgraph/cli/commands.py:
  Handle HumanInputRequired in run_query(): stop spinner, show Rich
  panel with question, prompt user with Prompt.ask(), resume graph
  with Command(resume=answer). Loop until graph completes.

- src/ui/app.py:
  Handle HumanInputRequired in Streamlit UI: store interrupt in
  session_state, show warning with question, resume with
  Command(resume=answer) on next submit.

- tests/test_human_interrupt.py:
  Add comprehensive test suite: PlannerResponse ask_human schema
  tests, planner_agent routing tests, unified_planner_router tests,
  human_review_node interrupt tests, ask_human tool tests, graph
  construction tests for both human_supervised=True and False, and
  get_single_agent_prompt helper tests.
Establish a consistent architecture across all tool domains where
pure-Python logic lives in *_core.py modules and LangChain @tool /
MCP @mcp.tool wrappers are thin one-liner delegates.

Phase 1 - Rename and relocate:
- Rename tools/core.py -> tools/ase_core.py for clarity
- Move MACE schemas from parsl_tools.py to schemas/mace_parsl_schema.py
- Add missing __init__.py in mcp/ and schemas/calculators/

Phase 2 - Extract core modules:
- Create cheminformatics_core.py (deduplicate RDKit/PubChem logic
  from 3 files into single smiles_to_3d implementation)
- Create xanes_core.py (extract from xanes_tools.py, delete
  redundant xanes_mcp.py by merging into xanes_mcp_parsl.py)
- Create graspa_core.py (extract from graspa_tools.py)

Phase 3 - Consolidate shared utilities:
- Add extract_output_json_core to ase_core.py (was duplicated 3x)
- Consolidate load_parsl_config into mcp/server_utils.py (was 2x)
- Replace hardcoded element_map dicts in report_tools.py with
  ase.data.chemical_symbols (handles all elements, not just 1-18)

Phase 4 - Cleanup:
- Delete mcp_helper.py shim, update all consumers
- Refactor new_eval MCP example scripts from ~486 to ~88 lines each
- Fix stale import in utils/tool_call_eval.py

Net result: ~2769 lines removed, ~2783 added (new core modules),
with significant deduplication. All 322 existing tests pass.
- Replace custom HTML bubbles with st.chat_message and st.chat_input
- Add HumanInputRequired interrupt handling with resume flow
- Stream tool calls live via st.status (compact display with checkmarks)
- Fix broken ase_core import in visualization.py
- Fix duplicate widget keys for multi-query structure rendering
- Isolate per-query messages to prevent stale structure display
- Remove outdated Quick Help footer
- Change human_supervised default from True to False across agent,
  graph, CLI, and UI layers so ask_human is opt-in
- Add --human-supervised CLI flag and UI checkbox to enable it
- Thread human_supervised through initialize_agent, interactive_mode,
  and agent_manager
- Add human_supervised to config defaults (CLI and UI loaders)
- Refactor resume loop in run_query to use astream instead of ainvoke
  so tool-call output is printed during human-in-the-loop resume
- Update test to explicitly pass human_supervised=True
Resolve conflict in src/ui/_pages/main_interface.py: keep branding
import from dev, drop unused run_async_callable import.
feat: add human-in-the-loop + refactor tools into core/wrapper pattern
tdpham2 and others added 10 commits May 6, 2026 14:26
Sync k8s deployment with docker-compose.yml changes from 23433d0.
Add both variables as optional secret-backed env vars in the
deployment, secrets template, and documentation.
Trigger Docker image build on feature_k8s pushes, tagging as
ghcr.io/argonne-lcf/chemgraph:feature-k8s. Update deployment.yaml
to use the branch-specific image tag. The :latest tag is now
reserved for version tag (v*) releases only.
Fix Docker build failure caused by missing gfortran symbol
(_gfortran_concat_string) in tblite 0.5.0.
Hermes only needs amd64. This eliminates the slow arm64 build
that was blocking the manifest creation step.
The metadata action keeps the branch name as-is, so the tag is
feature_k8s not feature-k8s. Update the workflow inspect step
and deployment.yaml to match.
…ation

Add --mcp-url and --mcp-command CLI arguments so users can connect to
MCP servers directly from the CLI without writing custom scripts:

  chemgraph run -q "..." --mcp-url http://localhost:9003/mcp/
  chemgraph run -q "..." --mcp-command "python -m chemgraph.mcp.mcp_tools"

MCP tools are loaded before agent initialization and passed to any
workflow (single_agent, multi_agent, etc.) via the existing tools
parameter.

Also fix a bug where MCP tool messages returning content as a list of
content blocks (e.g. [{'type': 'text', 'text': '...'}]) would cause
a Pydantic validation error when saving to the session store.
Add dedicated deployment and service manifests for the ChemGraph
MCP server on port 9003, using streamable HTTP transport. Shares
the same image, secrets, and proxy config as the Streamlit deployment.
Deploy, manage, and tear down both chemgraph-streamlit and chemgraph-mcp
from a single script. Adds service-specific targets for logs and
port-forward commands. Replaces cluster-info connection check with
namespace-level check to support RBAC-limited users.
Add dev to ghcr-publish.yml branch triggers so merging to dev
builds a Docker image automatically. Update K8s manifests to use
the dev image tag. Add IMAGE_TAG env var to deploy.sh for
overriding the image tag at deploy time.
Add Kubernetes deployment support. Fully tested/deployed on ALCF Hermes.
@tdpham2

tdpham2 commented May 7, 2026

Copy link
Copy Markdown
Collaborator Author

Keep this open to document ChemGraph dev changes.

keceli and others added 17 commits May 30, 2026 18:41
IdealGasThermo previously hardcoded spin=0 regardless of the calculator's
spin state, silently ignoring user-configured multiplicity (e.g. UMA's
spin field on FAIRChemCalc). Each calculator schema also exposed spin
inconsistently or not at all.

- FAIRChemCalc: rename "spin" -> "multiplicity" (semantics were already
  multiplicity, just misnamed). "spin=" is still accepted as a deprecated
  alias via a model_validator. atoms.info["spin"] is still emitted because
  UMA reads it from there.
- TBLiteCalc, OrcaCalc: already had "multiplicity"; add get_multiplicity().
- NWChemCalc: add "multiplicity" and "charge", injected into the theory
  block (dft/mp2/ccsd/tce/tddft) as "mult"/"charge".
- Psi4Calc: add "multiplicity" and "charge", passed through to ASE Psi4.
- ase_tools and mcp_tools thermo blocks: derive S = (M-1)/2 from
  calc_model.get_multiplicity() (defaults to 1 = singlet) and pass to
  IdealGasThermo. parsl_tools is MACE-only, so spin=0 stays.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Limit default pytest discovery to the tests/ directory so legacy evaluation scripts are not collected during normal test runs.
- Declare deepdiff in pyproject.toml because chemgraph.utils.tool_call_eval imports DeepDiff at runtime.
@keceli

keceli commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Release ChemGraph v0.5.0

This PR merges the current dev branch into main for the ChemGraph 0.5.0 release.

Highlights

  • Updates package metadata to version 0.5.0 and fixes source-checkout version reporting so UI deployments do not show unknown.
  • Modernizes the Streamlit UI with modular pages, improved chat/session behavior, available-calculator display, better math/report rendering, HTML report downloads, and build/host metadata.
  • Adds calculator availability detection during agent initialization and improves calculator selection, including xTB/TBLite alias handling.
  • Expands agent workflows with human-in-the-loop support, single-agent routing tests, retry/session fixes, and safer state serialization.
  • Adds and improves CLI, memory/session persistence, model routing, MCP client support, RAG, XANES, and evaluation tooling.
  • Adds Kubernetes, GHCR, Streamlit Cloud, HPC, and MCP deployment documentation and examples.
  • Improves CI reliability with dependency pins, Windows serializer fixes, Ruff cleanup, and expanded tests.

@keceli keceli merged commit f65fb9f into main Jun 2, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants