Fix RescaleIntensity on images larger than 2**24 voxels#1474
Conversation
torch.quantile raises "input tensor is too large" for inputs with more than 2**24 elements, so RescaleIntensity/Normalize failed on high-resolution volumes (e.g. Colin27(2008), ~57M voxels), even with the default 0/100 percentiles. Route percentile computation through a helper that returns min/max for the 0 and 100 endpoints (exact, and avoids torch.quantile for the common default) and estimates interior quantiles of oversized tensors from a deterministic strided subsample.
for more information, see https://pre-commit.ci
📖 Docs PreviewPreview of the documentation for this PR: 🔗 https://smokeshow.helpmanual.io/21612l3n3o356s4l2i24/ Built from 54e89a3 |
There was a problem hiding this comment.
Pull request overview
This pull request updates Normalize/RescaleIntensity to avoid torch.quantile() failures on very large images (> 2**24 voxels) by introducing a _quantile helper that uses exact min/max for endpoint percentiles and a deterministic strided subsample for interior percentiles. This keeps default behavior working on high-resolution volumes while preserving existing behavior for tensors under the PyTorch quantile size limit.
Changes:
- Route percentile computation through a new
_quantile()helper that avoidstorch.quantile()on 0%/100% and subsamples oversized tensors for interior quantiles. - Add tests that exercise endpoint percentiles, interior percentiles on oversized tensors, and a full
RescaleIntensitycall on a large tensor. - Normalize
CNAMEformatting (no functional code impact).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/torchio/transforms/intensity/normalize.py |
Adds _quantile() and uses it in _percentile_range() to handle tensors exceeding PyTorch’s quantile size limit. |
tests/test_normalize.py |
Adds coverage for large-tensor percentile behavior and a RescaleIntensity regression test. |
CNAME |
Trims/normalizes formatting of the domain entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Cast the (possibly subsampled) values to float only after striding, so an oversized non-float32 tensor never materializes a full-size float copy.
| if sample.numel() > _MAX_QUANTILE_ELEMENTS: | ||
| step = sample.numel() // _MAX_QUANTILE_ELEMENTS + 1 | ||
| sample = sample[::step] |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in 64e1559. Subsampling now targets ~1e6 values (_QUANTILE_SUBSAMPLE_SIZE = 1_000_000), well under the 2**24 limit, so torch.quantile runs on at most ~1M elements and stays fast.
| def test_percentile_range_interior_subsamples(self) -> None: | ||
| from torchio.transforms.intensity.normalize import _percentile_range | ||
|
|
||
| values = torch.linspace(0.0, 100.0, self.LIMIT + 1000).reshape(1, -1, 1, 1) | ||
| low, high = _percentile_range(values, None, 25.0, 75.0, "t1") | ||
| assert 24.0 < low < 26.0 | ||
| assert 74.0 < high < 76.0 |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in 64e1559. The interior-subsample test now monkeypatches _QUANTILE_SUBSAMPLE_SIZE to a small value and uses a small tensor, so it exercises the strided-subsample branch quickly. The >2**24 integration is still covered by the endpoint and full RescaleIntensity tests (which use the min/max path).
Subsample oversized inputs to ~1e6 values (well under torch.quantile's 2**24 limit) so percentile estimation stays fast and low-memory on high-resolution volumes. Exercise the subsample branch in tests via a small monkeypatched cap instead of building a >2**24 tensor.
| if q <= 0: | ||
| return float(values.min().item()) | ||
| if q >= 1: | ||
| return float(values.max().item()) |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in 10ea2fd. The fast path now only triggers for exact q == 0 / q == 1 (min/max). Out-of-range percentiles fall through to torch.quantile, which validates q and raises as before. Added a regression test (test_invalid_percentile_still_raises).
| if sample.numel() > _QUANTILE_SUBSAMPLE_SIZE: | ||
| step = sample.numel() // _QUANTILE_SUBSAMPLE_SIZE + 1 | ||
| sample = sample[::step] | ||
| return float(torch.quantile(sample.float(), q).item()) |
There was a problem hiding this comment.
[Generated by a coding agent]
This is a deliberate trade-off (the previous review round asked to cap the work going into torch.quantile at ~1e6 to keep it fast and bounded), so subsampling above ~1e6 stays. I documented the approximation explicitly at the public Normalize API: the 0/100 endpoints are exact (min/max), and interior percentiles of very large images are estimated from a deterministic subsample (10ea2fd).
Treat only q==0 and q==1 as exact min/max endpoints so out-of-range percentiles still fall through to torch.quantile and raise (instead of being silently clamped). Document at the public Normalize API that interior percentiles of very large images are estimated from a subsample.
Build the oversized input with zeros and a single max voxel and assert sentinel voxels, avoiding extra full-tensor min/max reductions.
| if sample.numel() > _QUANTILE_SUBSAMPLE_SIZE: | ||
| step = sample.numel() // _QUANTILE_SUBSAMPLE_SIZE + 1 | ||
| sample = sample[::step] |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in a7d7fb6. Switched the subsample stride to ceiling division (step = -(-sample.numel() // _QUANTILE_SUBSAMPLE_SIZE)), so the strided subsample stays as close to the cap as possible while never exceeding it (the old numel // target + 1 undersampled, e.g. ~2/3 of the cap for exact multiples). Added a test asserting the subsample size stays within the cap.
`numel // target + 1` undersamples (e.g. ~2/3 of the cap for exact multiples). Ceiling division keeps the subsample as close to the target size as possible while still never exceeding it.
| def test_quantile_subsample_stays_within_cap( | ||
| self, | ||
| monkeypatch: pytest.MonkeyPatch, | ||
| ) -> None: | ||
| # The strided subsample must never exceed the target size, including | ||
| # for exact multiples of the cap. | ||
| from torchio.transforms.intensity import normalize as norm | ||
|
|
||
| target = 1000 | ||
| monkeypatch.setattr(norm, "_QUANTILE_SUBSAMPLE_SIZE", target) | ||
| for numel in (target + 1, 2 * target, 3 * target, 10 * target + 7): | ||
| values = torch.arange(numel, dtype=torch.float32) | ||
| step = -(-values.numel() // target) | ||
| assert values[::step].numel() <= target | ||
|
|
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in cec0d4d. The test now monkeypatches torch.quantile with a spy that records the numel() of the tensor actually passed in, and asserts every recorded size stays within the cap. This fails if _quantile() changes its subsampling strategy (or stops subsampling) rather than re-implementing the stride math.
|
|
||
| Note: | ||
| The `0` and `100` percentiles are computed exactly (as | ||
| the min and max). For very large images, *interior* | ||
| percentiles are estimated from a deterministic strided | ||
| subsample to keep the computation fast and bounded in | ||
| memory. |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in cec0d4d. Flattened the nested Note: block into plain parameter text under percentile_high to avoid docstring-parser ambiguity and keep Note: reserved for top-level sections.
- Subsample-cap test now spies on torch.quantile to assert the actual tensor length passed in stays within the cap, so it fails if the subsampling strategy changes. - Out-of-range quantile test accepts ValueError as well as RuntimeError, since torch's exception type is not guaranteed across versions. - Flatten the nested Note: block in the percentile_high docstring into plain parameter text for consistent rendering.
| data = torch.zeros(self.LIMIT + 1000).reshape(1, -1, 1, 1) | ||
| data[0, 0, 0, 0] = 1000.0 | ||
| result = tio.RescaleIntensity(out_min=0, out_max=1)(tio.ScalarImage(data)) | ||
| assert result.data[0, 0, 0, 0].item() == pytest.approx(1.0, abs=1e-4) | ||
| assert result.data[0, 1, 0, 0].item() == pytest.approx(0.0, abs=1e-4) |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in 54f7b14. The test now passes copy=False so the oversized ScalarImage is not deep-copied before transforming, removing the extra full-size allocation while keeping the end-to-end coverage.
Pass copy=False so the >2**24-element ScalarImage is not deep-copied before transforming, reducing the test's memory footprint.
Extract _subsample_for_quantile and add a subsample_size parameter to _quantile and _percentile_range (defaulting to _QUANTILE_SUBSAMPLE_SIZE). Tests now inject a small cap directly and assert the cap bound on the real production helper, removing the module-constant monkeypatch and the torch.quantile spy.
| if values.numel() <= target: | ||
| return values | ||
| step = -(-values.numel() // target) | ||
| return values[::step] |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in 65c3091. _subsample_for_quantile now raises a clear ValueError for a non-positive target (instead of a ZeroDivisionError or odd negative-stride behaviour) now that subsample_size is caller-exposed. Added a parametrized test for target in {0, -1}.
Now that subsample_size is exposed to callers, _subsample_for_quantile raises ValueError for a non-positive target instead of a ZeroDivisionError or odd negative-stride behaviour.
| # Exercise the subsample branch quickly via a small cap passed | ||
| # through the public parameter (no monkeypatching). |
There was a problem hiding this comment.
[Generated by a coding agent]
Fixed in 9d96e4a. Reworded the comment to say the cap is injected through the helper's subsample_size argument, avoiding any implication that _percentile_range is public API.
_percentile_range is an internal helper, not public API; reword the comment to say the cap is injected through its subsample_size argument.
|
@copilot undo all changes and use this instead (add in the docstring a link to @ego-thales' solution): pytorch/pytorch#157431 (comment) |
[Generated by a coding agent] Done in c672080. Reverted the previous strided-subsample approach and replaced it with @ego-thales' |
Description
torch.quantileraisesRuntimeError: quantile() input tensor is too largefor inputs with more than2**24(16,777,216) elements.RescaleIntensity/Normalizecomputes its input range withtorch.quantile, so it fails on high-resolution volumes. For example:This replaces the
torch.quantilecall in_percentile_rangewith a small_quantilehelper built ontorch.kthvalue, which has no2**24-element limit and is much faster on large tensors. The helper computes a single quantile of a 1D tensor with linear interpolation to reproduce the defaulttorch.quantilebehaviour, validates thatqis within[0, 1], and its docstring links to @ego-thales' solution from pytorch/pytorch#157431 (comment). Results are exact for all tensor sizes (no subsampling/approximation), so behaviour is unchanged for tensors that previously worked.TestQuantileadds coverage for parity withtorch.quantileon a small tensor, theValueErrorfor out-of-rangeq, an interior quantile on a>2**24tensor, and a fullRescaleIntensityrun on an image that exceeds the limit. Lint, format, and tests pass.Checklist
CONTRIBUTINGdocs and have a developer setup ready