Summary
The converter cancels float → int → float cast round-trips as if they were no-ops, silently dropping the truncation semantics of the intermediate integer cast. The classic "floor for bounded values" idiom (x + K).long().float() - K therefore compiles to the identity function — on every compute unit, including CPU.
Environment
- coreai-torch 0.4.0, coreai-core 1.0.0b1 (cp312), torch 2.11.0
- macOS 27.0 (build 26A5353q), M4 Max
Minimal repro
import asyncio, shutil
from pathlib import Path
import torch
import coreai.runtime as rt
from coreai_torch import TorchConverter, get_decomp_table
class M(torch.nn.Module):
def forward(self, x):
return (x + 64.0).long().float() - 64.0 # floor(x) for x > -64
x = torch.tensor([0.3, 1.7, -0.4, -1.6])
ep = torch.export.export(M().eval(), (x,)).run_decompositions(get_decomp_table())
prog = TorchConverter().add_exported_program(exported_program=ep, input_names=["x"], output_names=["y"]).to_coreai()
prog.optimize()
out = Path("/tmp/cast_pair.aimodel"); shutil.rmtree(out, ignore_errors=True)
prog.save_asset(out, rt.AIModelAssetMetadata())
async def run():
m = await rt.AIModel.load(out, rt.SpecializationOptions.cpu_only())
return (await m.load_function("main")({"x": rt.NDArray(x.numpy())}))["y"].numpy()
print(asyncio.run(run()))
# got: [ 0.3 1.7 -0.4 -1.6] (identity)
# expected: [ 0. 1. -1. -2. ] (torch eager)
Expected
A float→int cast truncates; the pair is only removable when the value range provably contains integers. Either keep the casts or restrict the cancellation to provably-integer-valued producers.
Notes
One-directional casts consumed by integer-typed ops (e.g. gather indices) behave correctly — only the round-trip is folded. Found while porting RF-DETR's deformable-attention bilinear sampling. Related: with aten.floor unavailable on the GPU delegate (separate issue), this fold also removes the natural workaround.
Summary
The converter cancels
float → int → floatcast round-trips as if they were no-ops, silently dropping the truncation semantics of the intermediate integer cast. The classic "floor for bounded values" idiom(x + K).long().float() - Ktherefore compiles to the identity function — on every compute unit, including CPU.Environment
Minimal repro
Expected
A float→int cast truncates; the pair is only removable when the value range provably contains integers. Either keep the casts or restrict the cancellation to provably-integer-valued producers.
Notes
One-directional casts consumed by integer-typed ops (e.g.
gatherindices) behave correctly — only the round-trip is folded. Found while porting RF-DETR's deformable-attention bilinear sampling. Related: withaten.floorunavailable on the GPU delegate (separate issue), this fold also removes the natural workaround.