Skip to content

fix(neighbors): delegate batched method selection to ops#106

Draft
nikitafedik wants to merge 1 commit into
NVIDIA:mainfrom
nikitafedik:fix/batched-neighbor-method-guard
Draft

fix(neighbors): delegate batched method selection to ops#106
nikitafedik wants to merge 1 commit into
NVIDIA:mainfrom
nikitafedik:fix/batched-neighbor-method-guard

Conversation

@nikitafedik

@nikitafedik nikitafedik commented Jun 6, 2026

Copy link
Copy Markdown

ALCHEMI Toolkit Pull Request

Description

Align Toolkit neighbor-list construction with current Toolkit-Ops dispatch:
compute_neighbors and NeighborListHook now pass batched metadata
(batch_idx / batch_ptr) without forcing an explicit neighbor-list method.
That lets Toolkit-Ops choose the correct batched strategy via method=None.

This pairs with an upstream Toolkit-Ops guard that rejects explicit
single-system methods such as method="naive" or method="cell_list" when
batched metadata is provided.

The observed failure mode is that forcing method="naive" can connect atoms
from different batched systems as neighbors. The model then treats those
cross-system edges like real neighbor interactions/messages.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Performance improvement
  • Documentation update
  • Refactoring (no functional changes)
  • CI/CD or infrastructure change

Related Issues

Relates to neighbor-list batching reports where explicit unbatched Toolkit-Ops
methods can create cross-graph neighbor edges.

Changes Made

  • Removed Toolkit's internal explicit batched-method selector.
  • Updated compute_neighbors and NeighborListHook to leave method=None when calling Toolkit-Ops with batched metadata.
  • Removed pre-allocation/forwarding of algorithm-specific scratch kwargs from NeighborListHook; Toolkit-Ops chooses among geometry-dependent strategies at dispatch time.
  • Added graph-boundary regression coverage for compute_neighbors and strengthened NeighborListHook boundary assertions.
  • Added tests that verify Toolkit delegates method selection instead of passing stale explicit methods or scratch kwargs.
  • Updated CHANGELOG.md.

Testing

  • Unit tests pass locally (make pytest)
  • Linting passes (make lint)
  • New tests added for new functionality meets coverage expectations?

Ran locally:

WARP_CACHE_PATH=/tmp/warp-cache \
TRITON_CACHE_DIR=/tmp/triton-cache \
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor-cache \
TORCH_COMPILE_DISABLE=1 \
python -m pytest test/hooks/test_neighbor_list_hook.py -q

Result: 123 passed, 16 warnings.

ruff check nvalchemi/neighbors.py nvalchemi/hooks/neighbor_list.py test/hooks/test_neighbor_list_hook.py

Result: All checks passed!

Coverage added in test/hooks/test_neighbor_list_hook.py:

  • test_compute_neighbors_multi_graph_isolation verifies one-shot neighbor
    construction does not create cross-graph neighbor entries.
  • test_multi_graph_isolation now checks every valid NeighborListHook
    neighbor entry for src_system == dst_system.
  • test_compute_neighbors_delegates_method_selection verifies
    compute_neighbors leaves method=None.
  • TestAllocNlKwargs verifies NeighborListHook does not forward stale
    algorithm-specific scratch kwargs while Toolkit-Ops owns method selection.

Here "hooks" refers to Toolkit runtime hooks, not Git or CI hooks. CI exercises
them by running pytest; the tests instantiate and call NeighborListHook
directly.

Live H2O boundary probe against the patched Toolkit branch:

Toolkit compute_neighbors    total_edges= 16 cross_edges=  0 examples=[]
Toolkit NeighborListHook     total_edges= 16 cross_edges=  0 examples=[]
ops method=batch_naive       total_edges= 16 cross_edges=  0 examples=[]
ops method=naive             raises ValueError with upstream Toolkit-Ops guard
ops method=batch_cell_list   total_edges= 16 cross_edges=  0 examples=[]
ops method=cell_list         raises ValueError with upstream Toolkit-Ops guard
ops method=None              total_edges= 16 cross_edges=  0 examples=[]

Checklist

  • I have read and understand the Contributing Guidelines
  • I have updated the CHANGELOG.md
  • I have performed a self-review of my code
  • I have added docstrings to new functions/classes
  • I have updated the documentation (if applicable)

Additional Notes

This PR intentionally does not change the Toolkit public API. It updates
Toolkit's own neighbor-list callers to use Toolkit-Ops' official batched
auto-dispatch path.

The paired upstream Toolkit-Ops PR should land first or alongside this one so
direct explicit misuse fails loudly instead of silently treating a concatenated
batch as one system.

The current CONTRIBUTING.md says public direct contributions are not accepted
during the initial public beta, and signed-off commits are required.

@copy-pr-bot

copy-pr-bot Bot commented Jun 6, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Fixes a correctness bug where NeighborListHook and compute_neighbors pre-allocated algorithm-specific scratch buffers (shift-range tensors, cell-list arrays) and inadvertently steered Toolkit-Ops toward single-system algorithms that could connect atoms across batch-graph boundaries. The fix is to stop passing those scratch kwargs entirely and let Toolkit-Ops auto-select the correct batched strategy via method=None.

  • _alloc_nl_kwargs is reduced to self._buf_nl_kwargs = {}, and the cell-list / naive-PBC scratch allocation logic is fully removed. _alloc_staging_buffers no longer accepts or uses batch_ptr since it was only needed by the old kwarg pre-allocation.
  • Tests in TestAllocNlKwargs are rewritten: the old assertions that expected specific pre-computed keys are replaced by assertions that _buf_nl_kwargs is empty; a new monkeypatched test verifies no stale scratch keys are forwarded.
  • _assert_no_cross_graph_neighbors is extracted as a shared helper and used to strengthen test_multi_graph_isolation, plus two new tests exercising compute_neighbors directly.

Important Files Changed

Filename Overview
nvalchemi/hooks/neighbor_list.py Removed algorithm-specific kwarg pre-allocation from _alloc_nl_kwargs; _alloc_staging_buffers signature simplified (batch_ptr removed); neighbor_list calls now pass no explicit method, delegating selection to Toolkit-Ops
test/hooks/test_neighbor_list_hook.py TestAllocNlKwargs updated to verify empty kwargs; added _assert_no_cross_graph_neighbors helper; added multi-graph isolation and method-delegation tests for both NeighborListHook and compute_neighbors
CHANGELOG.md Prepended entry for batched neighbor-list dispatch alignment fix

Reviews (3): Last reviewed commit: "fix(neighbors): delegate batched method ..." | Re-trigger Greptile

Comment thread test/hooks/test_neighbor_list_hook.py Outdated
Comment on lines +1018 to +1045
def test_compute_neighbors_multi_graph_isolation(self, device: str):
"""compute_neighbors must not build neighbors across Batch graph boundaries."""
from nvalchemi.neighbors import compute_neighbors

batch = _line_batch(device, n_graphs=4)
compute_neighbors(batch, cutoff=_CUTOFF, max_neighbors=16)

_assert_no_cross_graph_neighbors(batch)

def test_compute_neighbors_passes_explicit_batched_method(
self, device: str, monkeypatch: pytest.MonkeyPatch
):
"""Toolkit should not rely on implicit Toolkit-Ops method selection."""
from nvalchemi import neighbors as neighbors_mod
from nvalchemi.neighbors import compute_neighbors

methods: list[str | None] = []

def fake_neighbor_list(**kwargs):
methods.append(kwargs.get("method"))
kwargs["num_neighbors"].zero_()

monkeypatch.setattr(neighbors_mod, "neighbor_list", fake_neighbor_list)

batch = _line_batch(device, n_graphs=4)
compute_neighbors(batch, cutoff=_CUTOFF, max_neighbors=16)

assert methods == ["batch_naive"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 New compute_neighbors tests placed in wrong class

test_compute_neighbors_multi_graph_isolation and test_compute_neighbors_passes_explicit_batched_method are appended to TestAdaptiveK, which is focused on neighbor-count overflow and shrinkage behaviour. Both tests cover graph-boundary isolation and explicit method dispatch — neither exercises the adaptive-K machinery. These would be easier to discover in a dedicated class, e.g. TestComputeNeighbors.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread nvalchemi/neighbors.py Outdated
cutoff=cutoff,
cell=cell,
pbc=pbc,
method=_select_batched_neighbor_list_method(N, batch.num_graphs),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Method string recomputed on every overflow retry

_select_batched_neighbor_list_method(N, batch.num_graphs) is evaluated on each loop iteration, even though N and batch.num_graphs are invariant inside the while True block. Computing it once before the loop would make the intent clearer. Not a correctness issue.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@nikitafedik

Copy link
Copy Markdown
Author

Reproduced against nvalchemi-toolkit-ops main.

Ops main checkout:

803008406345cf9d8b5c48a5afc2653d0f29d2bf

I put the ops-main source checkout first on PYTHONPATH and ran the standalone script below. The forced unbatched methods still cross system boundaries; automatic dispatch and explicit batched methods do not.

Output:

nvalchemiops: /home/nfedik/projects/toolkit/batching-bug/runs/nvalchemi-toolkit-ops-main/nvalchemiops/__init__.py
torch device: cuda

method           total_edges cross_edges examples
--------------------------------------------------------------------------------
batch_naive               16           0 []
naive                    100          84 ['0->3 (system 0->1, O0->O0)', '0->4 (system 0->1, O0->H1)', '0->5 (system 0->1, O0->H2)', '0->6 (system 0->2, O0->O0)', '0->7 (system 0->2, O0->H1)']
batch_cell_list           16           0 []
cell_list                100          84 ['0->3 (system 0->1, O0->O0)', '0->4 (system 0->1, O0->H1)', '0->5 (system 0->1, O0->H2)', '0->6 (system 0->2, O0->O0)', '0->7 (system 0->2, O0->H1)']
None                      16           0 []

Script:

#!/usr/bin/env python3
"""Reproduce cross-system neighbor edges with forced unbatched NL methods.

Run against Toolkit-Ops main by putting the source checkout first on PYTHONPATH,
for example:

    WARP_CACHE_PATH=/tmp/warp-cache-batching-bug \
    PYTHONPATH=/path/to/nvalchemi-toolkit-ops-main \
    python repro_ops_main_neighbor_boundaries.py
"""

from __future__ import annotations

import os

os.environ.setdefault("WARP_CACHE_PATH", "/tmp/warp-cache-batching-bug")
os.environ.setdefault("XDG_CACHE_HOME", "/tmp/torch-cache-batching-bug")
os.environ.setdefault("TRITON_CACHE_DIR", "/tmp/triton-cache-batching-bug")
os.environ.setdefault("TORCHINDUCTOR_CACHE_DIR", "/tmp/torchinductor-cache-batching-bug")
for cache_dir in (
    os.environ["WARP_CACHE_PATH"],
    os.environ["XDG_CACHE_HOME"],
    os.environ["TRITON_CACHE_DIR"],
    os.environ["TORCHINDUCTOR_CACHE_DIR"],
):
    os.makedirs(cache_dir, exist_ok=True)

import torch
import nvalchemiops
from nvalchemiops.torch.neighbors import neighbor_list


def build_batched_waters(
    n_systems: int = 4,
    device: torch.device | str = "cpu",
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, list[str]]:
    """Return identical water-like systems packed into one coordinate array."""
    one_water = torch.tensor(
        [
            [0.0, 0.0, 0.0],
            [0.968565, 0.0, 0.0],
            [-0.242, 0.928, 0.0],
        ],
        dtype=torch.float32,
        device=device,
    )
    positions = one_water.repeat(n_systems, 1)
    batch_idx = torch.repeat_interleave(
        torch.arange(n_systems, dtype=torch.int32, device=device), 3
    )
    batch_ptr = torch.arange(
        0, 3 * n_systems + 1, 3, dtype=torch.int32, device=device
    )
    atom_names = ["O0", "H1", "H2"] * n_systems
    return positions, batch_idx, batch_ptr, atom_names


def count_cross_edges(
    neighbor_matrix: torch.Tensor,
    num_neighbors: torch.Tensor,
    batch_idx: torch.Tensor,
    atom_names: list[str],
) -> tuple[int, int, list[str]]:
    total_edges = 0
    cross_edges = 0
    examples: list[str] = []

    for src in range(neighbor_matrix.shape[0]):
        src_system = int(batch_idx[src].item())
        for dst in neighbor_matrix[src, : int(num_neighbors[src].item())].tolist():
            dst = int(dst)
            total_edges += 1
            dst_system = int(batch_idx[dst].item())
            if src_system != dst_system:
                cross_edges += 1
                if len(examples) < 5:
                    examples.append(
                        f"{src}->{dst} "
                        f"(system {src_system}->{dst_system}, "
                        f"{atom_names[src]}->{atom_names[dst]})"
                    )

    return total_edges, cross_edges, examples


def run_case(method: str | None) -> tuple[str, int, int, list[str]]:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    positions, batch_idx, batch_ptr, atom_names = build_batched_waters(device=device)
    n_atoms = positions.shape[0]
    neighbor_matrix = torch.full(
        (n_atoms, 32), n_atoms, dtype=torch.int32, device=device
    )
    num_neighbors = torch.zeros(n_atoms, dtype=torch.int32, device=device)

    neighbor_list(
        positions=positions,
        cutoff=1.2,
        batch_idx=batch_idx,
        batch_ptr=batch_ptr,
        max_neighbors=32,
        neighbor_matrix=neighbor_matrix,
        num_neighbors=num_neighbors,
        method=method,
    )
    total_edges, cross_edges, examples = count_cross_edges(
        neighbor_matrix.cpu(),
        num_neighbors.cpu(),
        batch_idx.cpu(),
        atom_names,
    )
    return str(method), total_edges, cross_edges, examples


def main() -> None:
    print(f"nvalchemiops: {nvalchemiops.__file__}")
    print(f"torch device: {'cuda' if torch.cuda.is_available() else 'cpu'}")
    print()
    print(f"{'method':16s} {'total_edges':>11s} {'cross_edges':>11s} examples")
    print("-" * 80)

    for method in ("batch_naive", "naive", "batch_cell_list", "cell_list", None):
        label, total_edges, cross_edges, examples = run_case(method)
        print(f"{label:16s} {total_edges:11d} {cross_edges:11d} {examples}")


if __name__ == "__main__":
    main()

@nikitafedik nikitafedik force-pushed the fix/batched-neighbor-method-guard branch from 24e312b to af576e7 Compare June 8, 2026 21:31
@nikitafedik nikitafedik changed the title fix(neighbors): select batched methods explicitly fix(neighbors): delegate batched method selection to ops Jun 8, 2026
Signed-off-by: Nikita Fedik <nfedik@nvidia.com>
@nikitafedik nikitafedik force-pushed the fix/batched-neighbor-method-guard branch from af576e7 to d3480a3 Compare June 8, 2026 21:40
@nikitafedik

Copy link
Copy Markdown
Author

Updated after bot review: removed the unused _alloc_nl_kwargs arguments and the stale batch_ptr/plumbing comment in _alloc_staging_buffers. Re-ran ruff, diff-check, focused allocation tests, and the full test/hooks/test_neighbor_list_hook.py file: 123 passed.

@nikitafedik nikitafedik marked this pull request as draft June 8, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant