Craig clean#145
Open
craigwarner-ufastro wants to merge 7 commits into
Open
Conversation
fitting, tryUpdates, and improve GPULsqrOptimizer for background
fitting. Commit message generated by AI below:
Subject: Optimize GPU fitting pipelines and improve memory/edge-case robustness
Description:
This commit implements a series of significant performance optimizations and stability fixes for the GPU-accelerated fitting pipelines, focusing on tryUpdates, GPULsqrOptimizer, and handling of complex image stacks.
Key Changes:
GPU Point Source Fitter & Batching:
Implemented a hybrid strategy for getSingleImageUpdateDirection where CPU-based model images (including Lanczos shifting) are batched into 3D arrays before GPU processing.
Achieved a 2x speedup in update direction computation while maintaining full numerical correctness.
Optimized getBatchModelImages to handle cases where allderivs and tractor.images counts differ.
GPU tryUpdates for Point Sources & Galaxies:
Migrated galaxy tryUpdates to the GPU, achieving a 3x speedup compared to the CPU version.
Transitioned to float64 precision for point source tryUpdates to eliminate occasional divergence from CPU results.
Eliminated redundant CPU getLogProb() calls by performing batch calculations entirely on the GPU, yielding an additional 33% speedup.
Memory-Efficient ("Less Mem") GPU Mode:
Introduced a new memory-prediction helper to check available VRAM before execution.
Implemented a sequential image-looping fallback within the GPU kernels for large blobs that exceed memory limits, preventing crashes while retaining GPU acceleration.
GPULsqrOptimizer Enhancements:
Refactored background and sky fitting to use the new batching logic developed for tryUpdates.
Optimized sparse matrix accumulation, making the process 4x faster.
Overall algorithm runtime reduced from 292s to 99s (excluding solver time).
Handling None modelMasks & Edge Cases:
Added logic to filter out images where modelMask is None (e.g., objects off the edge of a segment).
The tractor context is now dynamically updated to only include valid images before proceeding with GPU fitting.
Ensured full CPU fallback if all modelMasks in a stack are None.
Performance Impact:
Total fitblobs runtime for large test blobs (e.g., Blob 1 of 0001p000) reduced from 5842s (CPU) to 1975s (GPU).
Bricks with long runtimes see improvements of up to 2x faster compared to previous GPU versions.
…izer This commit introduces a memory-efficient GPU processing mode and improves the reliability of the GPU-accelerated fitting paths. tractor/engine.py: - Add 'use_less_mem' and 'ie_stack' support to log-likelihood batch methods. - Implement sequential image processing in getLogLikelihoodBatch to minimize VRAM footprint for large FFT workspaces. - Add manual memory management (CuPy block clearing) during batch operations. tractor/factored_optimizer.py: - Filter out non-overlapping images (None masks) before GPU tryUpdates. - Implement 'use_less_mem' logic in GPUFriendlyOptimizer to handle large image stacks by iterating through valid images when VRAM is low. - Add robust state restoration (images/masks) using try...finally blocks. - Improve error reporting with tracebacks and fallback to CPU on GPU failure. - Add debug logging for linear algebra internals and improve NaN handling.
debugging prints.
Merged conflicts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge "craig clean" in to craig_factored_merge
craig_clean has two steps of differences from craig_factored_merge: 1) It includes new GPU logic or point source fitting, tryUpdates, and GPULsqrOptimizer. 2) It cleans up unused branches and options and print and timing statements.
Logs from craig_clean updates:
Commit with full debug info for refactor to GPU-ize point source
fitting, tryUpdates, and improve GPULsqrOptimizer for background
fitting. Commit message generated by AI below: