fill_holes no-op path leaves native CUDA cache allocated


## Background

While investigating GPU memory growth in a long-running Trellis2 FastAPI service, we found that the process GPU baseline increased while `torch.cuda.memory_reserved()` stayed stable. This suggests native CUDA allocations outside the PyTorch allocator.

The issue was isolated to `CuMesh::fill_holes()`.

May be the reason of https://github.com/microsoft/TRELLIS.2/issues/136

## Problem

`CuMesh::fill_holes()` can retain internal connectivity/boundary cache on no-op / early-return paths.

Typical case:

1. Call `fill_holes()` once on an open mesh. The hole is filled successfully.
2. Call `fill_holes()` again on the same mesh. The mesh is already closed, so this should be a no-op.
3. The second call still rebuilds internal cache such as `edges` and boundary adjacency.
4. The function returns early without calling `clear_cache()`.

As a result, the `CuMesh` object keeps extra native CUDA memory until `clear_cache()` is called manually or the object is destroyed.

## Repro Script

Environment:

- GPU: NVIDIA H20
- CUDA: 12.8
- Python: 3.12.3
- Mesh: generated `768 x 768` grid mesh

```python
import gc
import torch
from cumesh import CuMesh


def mib(x):
    return x / 1024 / 1024


def used_mib():
    torch.cuda.synchronize()
    free, total = torch.cuda.mem_get_info()
    return mib(total - free)


def cleanup():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.synchronize()


def make_grid(n=768):
    y, x = torch.meshgrid(
        torch.linspace(0, 1, n + 1, device="cuda"),
        torch.linspace(0, 1, n + 1, device="cuda"),
        indexing="ij",
    )
    vertices = torch.stack([x, y, torch.zeros_like(x)], -1).reshape(-1, 3).contiguous()

    cy, cx = torch.meshgrid(
        torch.arange(n, device="cuda", dtype=torch.int64),
        torch.arange(n, device="cuda", dtype=torch.int64),
        indexing="ij",
    )
    base = cy * (n + 1) + cx
    f1 = torch.stack([base, base + 1, base + n + 2], -1)
    f2 = torch.stack([base, base + n + 2, base + n + 1], -1)
    faces = torch.stack([f1, f2], 2).reshape(-1, 3).to(torch.int32).contiguous()
    return vertices, faces


v, f = make_grid()
cleanup()
base = used_mib()

mesh = CuMesh()
mesh.init(v, f)
cleanup()
print(f"after init:        delta={used_mib() - base:+.2f} MiB E={mesh.num_edges} B={mesh.num_boundaries}")

mesh.fill_holes(9999.0)
cleanup()
print(f"after first fill:  delta={used_mib() - base:+.2f} MiB E={mesh.num_edges} B={mesh.num_boundaries}")

mesh.fill_holes(9999.0)
cleanup()
print(f"after second fill: delta={used_mib() - base:+.2f} MiB E={mesh.num_edges} B={mesh.num_boundaries}")

mesh.clear_cache()
cleanup()
print(f"after clear_cache: delta={used_mib() - base:+.2f} MiB E={mesh.num_edges} B={mesh.num_boundaries}")
```

## Actual Result Before Fix

```text
after init:        delta=+22.00 MiB E=0 B=0
after first fill:  delta=+24.00 MiB E=0 B=0
after second fill: delta=+162.00 MiB E=1774080 B=0
after clear_cache: delta=+24.00 MiB E=0 B=0
```

The second `fill_holes()` call should be a no-op, but `E=1774080` shows that internal edge/connectivity cache is retained. Calling `clear_cache()` drops the memory back, confirming that the retained memory is CuMesh cache.

## Expected Result After Fix

```text
after init:        delta=+22.00 MiB E=0 B=0
after first fill:  delta=+24.00 MiB E=0 B=0
after second fill: delta=+24.00 MiB E=0 B=0
after clear_cache: delta=+24.00 MiB E=0 B=0
```

The second no-op `fill_holes()` call should not leave connectivity/boundary cache behind.

Closes #32 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fill_holes no-op path leaves native CUDA cache allocated #33

Background

Problem

Repro Script

Actual Result Before Fix

Expected Result After Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fill_holes no-op path leaves native CUDA cache allocated #33

Description

Background

Problem

Repro Script

Actual Result Before Fix

Expected Result After Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions