Craig clean by craigwarner-ufastro · Pull Request #145 · dstndstn/tractor

craigwarner-ufastro · 2026-05-18T19:34:56Z

Merge "craig clean" in to craig_factored_merge

craig_clean has two steps of differences from craig_factored_merge: 1) It includes new GPU logic or point source fitting, tryUpdates, and GPULsqrOptimizer. 2) It cleans up unused branches and options and print and timing statements.

Logs from craig_clean updates:
Commit with full debug info for refactor to GPU-ize point source
fitting, tryUpdates, and improve GPULsqrOptimizer for background
fitting. Commit message generated by AI below:

Subject: Optimize GPU fitting pipelines and improve memory/edge-case robustness
Description:

This commit implements a series of significant performance optimizations and stability fixes for the GPU-accelerated fitting pipelines, focusing on tryUpdates, GPULsqrOptimizer, and handling of complex image stacks.

Key Changes:

    GPU Point Source Fitter & Batching:

        Implemented a hybrid strategy for getSingleImageUpdateDirection where CPU-based model images (including Lanczos shifting) are batched into 3D arrays before GPU processing.

Achieved a 2x speedup in update direction computation while maintaining full numerical correctness.

Optimized getBatchModelImages to handle cases where allderivs and tractor.images counts differ.

GPU tryUpdates for Point Sources & Galaxies:

    Migrated galaxy tryUpdates to the GPU, achieving a 3x speedup compared to the CPU version.

Transitioned to float64 precision for point source tryUpdates to eliminate occasional divergence from CPU results.

Eliminated redundant CPU getLogProb() calls by performing batch calculations entirely on the GPU, yielding an additional 33% speedup.

Memory-Efficient ("Less Mem") GPU Mode:

    Introduced a new memory-prediction helper to check available VRAM before execution.

Implemented a sequential image-looping fallback within the GPU kernels for large blobs that exceed memory limits, preventing crashes while retaining GPU acceleration.

GPULsqrOptimizer Enhancements:

    Refactored background and sky fitting to use the new batching logic developed for tryUpdates.

Optimized sparse matrix accumulation, making the process 4x faster.

Overall algorithm runtime reduced from 292s to 99s (excluding solver time).

Handling None modelMasks & Edge Cases:

    Added logic to filter out images where modelMask is None (e.g., objects off the edge of a segment).

The tractor context is now dynamically updated to only include valid images before proceeding with GPU fitting.

Ensured full CPU fallback if all modelMasks in a stack are None.

Performance Impact:

    Total fitblobs runtime for large test blobs (e.g., Blob 1 of 0001p000) reduced from 5842s (CPU) to 1975s (GPU).

Bricks with long runtimes see improvements of up to 2x faster compared to previous GPU versions.


 Optimize GPU memory usage and improve robustness in engine and optimizer

 This commit introduces a memory-efficient GPU processing mode and improves
 the reliability of the GPU-accelerated fitting paths.

 tractor/engine.py:
 - Add 'use_less_mem' and 'ie_stack' support to log-likelihood batch methods.
 - Implement sequential image processing in getLogLikelihoodBatch to minimize
   VRAM footprint for large FFT workspaces.
 - Add manual memory management (CuPy block clearing) during batch operations.

 tractor/factored_optimizer.py:
 - Filter out non-overlapping images (None masks) before GPU tryUpdates.
 - Implement 'use_less_mem' logic in GPUFriendlyOptimizer to handle large
   image stacks by iterating through valid images when VRAM is low.
 - Add robust state restoration (images/masks) using try...finally blocks.
 - Improve error reporting with tracebacks and fallback to CPU on GPU failure.
 - Add debug logging for linear algebra internals and improve NaN handling.

fitting, tryUpdates, and improve GPULsqrOptimizer for background fitting. Commit message generated by AI below: Subject: Optimize GPU fitting pipelines and improve memory/edge-case robustness Description: This commit implements a series of significant performance optimizations and stability fixes for the GPU-accelerated fitting pipelines, focusing on tryUpdates, GPULsqrOptimizer, and handling of complex image stacks. Key Changes: GPU Point Source Fitter & Batching: Implemented a hybrid strategy for getSingleImageUpdateDirection where CPU-based model images (including Lanczos shifting) are batched into 3D arrays before GPU processing. Achieved a 2x speedup in update direction computation while maintaining full numerical correctness. Optimized getBatchModelImages to handle cases where allderivs and tractor.images counts differ. GPU tryUpdates for Point Sources & Galaxies: Migrated galaxy tryUpdates to the GPU, achieving a 3x speedup compared to the CPU version. Transitioned to float64 precision for point source tryUpdates to eliminate occasional divergence from CPU results. Eliminated redundant CPU getLogProb() calls by performing batch calculations entirely on the GPU, yielding an additional 33% speedup. Memory-Efficient ("Less Mem") GPU Mode: Introduced a new memory-prediction helper to check available VRAM before execution. Implemented a sequential image-looping fallback within the GPU kernels for large blobs that exceed memory limits, preventing crashes while retaining GPU acceleration. GPULsqrOptimizer Enhancements: Refactored background and sky fitting to use the new batching logic developed for tryUpdates. Optimized sparse matrix accumulation, making the process 4x faster. Overall algorithm runtime reduced from 292s to 99s (excluding solver time). Handling None modelMasks & Edge Cases: Added logic to filter out images where modelMask is None (e.g., objects off the edge of a segment). The tractor context is now dynamically updated to only include valid images before proceeding with GPU fitting. Ensured full CPU fallback if all modelMasks in a stack are None. Performance Impact: Total fitblobs runtime for large test blobs (e.g., Blob 1 of 0001p000) reduced from 5842s (CPU) to 1975s (GPU). Bricks with long runtimes see improvements of up to 2x faster compared to previous GPU versions.

…izer This commit introduces a memory-efficient GPU processing mode and improves the reliability of the GPU-accelerated fitting paths. tractor/engine.py: - Add 'use_less_mem' and 'ie_stack' support to log-likelihood batch methods. - Implement sequential image processing in getLogLikelihoodBatch to minimize VRAM footprint for large FFT workspaces. - Add manual memory management (CuPy block clearing) during batch operations. tractor/factored_optimizer.py: - Filter out non-overlapping images (None masks) before GPU tryUpdates. - Implement 'use_less_mem' logic in GPUFriendlyOptimizer to handle large image stacks by iterating through valid images when VRAM is low. - Add robust state restoration (images/masks) using try...finally blocks. - Improve error reporting with tracebacks and fallback to CPU on GPU failure. - Add debug logging for linear algebra internals and improve NaN handling.

debugging prints.

Merged conflicts

craigwarner-ufastro added 7 commits May 14, 2026 14:06

Cleaned up unused options, unused methods, and lot of timing and

92e6fe3

debugging prints.

Removed more prints

7a584e1

Merge branch 'craig_factored_merge' into craig_clean

2245c23

Merged conflicts

Added test_batch.py

4490de2

Updated default GPU mode and comments

eeec894

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Craig clean#145

Craig clean#145
craigwarner-ufastro wants to merge 7 commits into
craig_factored_mergefrom
craig_clean

craigwarner-ufastro commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

craigwarner-ufastro commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant