Skip to content

Add runtime requirements, gitignore, and fix typos#1

Open
walkerxiu wants to merge 75 commits into
nlile:mainfrom
walkerxiu:main
Open

Add runtime requirements, gitignore, and fix typos#1
walkerxiu wants to merge 75 commits into
nlile:mainfrom
walkerxiu:main

Conversation

@walkerxiu

@walkerxiu walkerxiu commented May 21, 2026

Copy link
Copy Markdown

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Stable Diffusion v2.1 with Stable unCLIP for image variation generation.
    • Integrated Karlo text-to-image model with super-resolution capabilities.
    • Added support for Intel Extension for PyTorch (IPEX) and TorchScript optimization.
    • Introduced BF16 precision support for improved performance.
    • Added device selection flexibility (CPU/GPU) for inference.
    • Added interactive Streamlit UI for Stable unCLIP.
  • Documentation

    • Updated README with v2 branding and Intel IPEX/TorchScript instructions.
    • Added comprehensive unCLIP and Karlo documentation.
  • Dependencies

    • Updated gradio, added kornia, invisible-watermark, and streamlit-drawable-canvas.

Review Change Stack

lwneal and others added 30 commits November 23, 2022 21:31
Tested with the following:

    python -m venv venv
    source venv/bin/activate
    pip install -U pip
    pip install -r requirements.txt

on Ubuntu 22 with Python 3.8, or similar
Paramterization -> Parameterization
Add standard python gitignore with additions for IDEs and MacOS
* Force cast to fp32 to avoid atten layer overflow
README and ModelCard for 2.1
Remove slash from end of assets/stable-samples/depth2img/d2i.gif
rromb and others added 27 commits February 23, 2023 13:01
Bumps [gradio](https://github.com/gradio-app/gradio) from 3.11 to 3.13.2.
- [Release notes](https://github.com/gradio-app/gradio/releases)
- [Changelog](https://github.com/gradio-app/gradio/blob/main/CHANGELOG.md)
- [Commits](gradio-app/gradio@v3.11.0...v3.13.2)

---
updated-dependencies:
- dependency-name: gradio
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Stable Diffusion 2.1 unCLIP Release
Add diffusers integration to Stable UnCLIP docs
@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR integrates Stable Diffusion v2.1 with unCLIP image conditioning and Kakao's Karlo text-to-image pipeline, adds device-configurable samplers, introduces cross-attention precision control, implements complete diffusion infrastructure, provides Streamlit UI for interactive image generation, and updates configurations/documentation.

Changes

Device Parametrization and Cross-Attention Precision

Layer / File(s) Summary
Sampler device parametrization
ldm/models/diffusion/ddim.py, ldm/models/diffusion/plms.py, ldm/models/diffusion/dpm_solver/sampler.py
DDIMSampler, PLMSSampler, and DPMSolverSampler constructors now accept explicit device parameter and use it for buffer placement instead of hard-coded CUDA device.
Cross-attention precision and UNet bf16 support
ldm/modules/attention.py, ldm/modules/diffusionmodules/openaimodel.py, ldm/modules/diffusionmodules/util.py
CrossAttention forward computes similarity in float32 when ATTN_PRECISION env var is set; UNetModel adds use_bf16 and adm_in_channels parameters for precision/conditioning; squaredcos_cap_v2 noise schedule support added.

Image Embedding Conditioning for Stable unCLIP

Layer / File(s) Summary
ImageEmbeddingConditionedLatentDiffusion and conditioning extensions
ldm/models/diffusion/ddpm.py, ldm/models/diffusion/dpm_solver/dpm_solver.py
New LatentDiffusion subclass that builds c_adm conditioning from image embeddings via optional noise augmentor; DiffusionWrapper supports TorchScript-optimized models; DPM solver supports dict-based conditioning.
Image encoding modules
ldm/modules/encoders/modules.py
ClipImageEmbedder loads OpenAI CLIP vision model; FrozenOpenCLIPImageEmbedder uses OpenCLIP with autocast; CLIPEmbeddingNoiseAugmentation adds configurable noise to CLIP embeddings with timestep embedding.

Karlo Diffusion Infrastructure

Layer / File(s) Summary
Gaussian diffusion math and sampling
ldm/modules/karlo/kakao/modules/diffusion/gaussian_diffusion.py
Implements forward/reverse diffusion processes, q_sample/q_posterior operations, p_mean_variance predictions, and multiple sampling methods (p_sample_loop, DDIM loops) with support for multiple parameterizations and loss types.
Timestep respacing and SpacedDiffusion
ldm/modules/karlo/kakao/modules/diffusion/respace.py
Selects subset of diffusion timesteps via section-based striding; SpacedDiffusion subclass rebuilds schedule and remaps timestep indices during sampling.
Neural network building blocks
ldm/modules/karlo/kakao/modules/nn.py
GroupNorm32, dimension-aware factories (conv_nd, linear, avg_pool_nd), parameter utilities (zero_module, scale_module), timestep embedding, mean_flat helpers.
UNet diffusion model
ldm/modules/karlo/kakao/modules/unet.py
TimestepBlock interface, Upsample/Downsample layers, ResBlock with optional conditioning, AttentionBlock with encoder context, main UNetModel with optional middle attention and specialized variants (SuperResUNetModel, PLMImUNet).
Prior transformer for CLIP conditioning
ldm/modules/karlo/kakao/modules/xf.py
Causal transformer conditioned on CLIP text/image embeddings and diffusion timesteps; includes custom LayerNorm for fp32 stability, multi-head attention with masking, and token projections.
Schedule sampling and Gaussian diffusion factory
ldm/modules/karlo/kakao/modules/resample.py, ldm/modules/karlo/kakao/modules/__init__.py
ScheduleSampler base class with importance-sampling; UniformSampler implementation; create_gaussian_diffusion factory configures and returns SpacedDiffusion instance.

Karlo Model Pipeline

Layer / File(s) Summary
CustomizedCLIP and tokenizer
ldm/modules/karlo/kakao/models/clip.py
CustomizedCLIP exports encode_image/encode_text via TorchScript; load_from_checkpoint reconstructs model from state_dict; CustomizedTokenizer handles padded tokenization with truncation/masking.
PriorDiffusionModel for image feature generation
ldm/modules/karlo/kakao/models/prior_model.py
Produces CLIP image features conditioned on text via diffusion prior; enforces classifier-free guidance; supports DDIM/p-sampling via timestep_respacing.
Text2ImProgressiveModel for 64×64 image generation
ldm/modules/karlo/kakao/models/decoder_model.py
Generates 64×64 images from text features via PLMImUNet with classifier-free guidance and progressive sampling; Text2ImModel wrapper returns final output only.
ImprovedSupRes64to256ProgressiveModel for upscaling
ldm/modules/karlo/kakao/models/sr_64_256.py, ldm/modules/karlo/kakao/models/sr_256_1k.py
Upscales 64×64 to 256×256 using two SuperResUNetModel instances with 7-step respacing; SupRes256to1kProgressive extends parent class.
BaseSampler template and sampler implementations
ldm/modules/karlo/kakao/template.py, ldm/modules/karlo/kakao/sampler.py
BaseSampler loads CLIP/prior/decoder/SR models from checkpoints; T2ISampler runs full prior→decoder→SR pipeline with optional progressive mode; PriorSampler runs prior only.

Diffusers UnCLIPPipeline Integration

Layer / File(s) Summary
UnCLIPPipeline text-to-image orchestration
ldm/modules/karlo/diffusers_pipeline.py
Wires CLIP text encoder, prior, decoder, super-resolution UNets, and schedulers; _encode_prompt handles tokenization and classifier-free guidance; call coordinates three denoising stages (prior→decoder→SR), applies guidance, upsamples, and post-processes output to PIL/NumPy.

Streamlit UI and Script Optimization

Layer / File(s) Summary
Streamlit unCLIP UI and sampling
scripts/streamlit/stableunclip.py
Interactive Streamlit app for generating images via Stable unCLIP with configurable model variants, sampling parameters, optional Karlo prior conditioning, and image output preview/grid saving.
txt2img device and optimization pipeline
scripts/txt2img.py
load_model_from_config accepts configurable device; new CLI flags for --device, --torchscript, --ipex, --bf16; optimization branch extracts/traces submodules with warm-up inference.

Configuration Files and Documentation

Layer / File(s) Summary
Karlo and Stable Diffusion v2 configurations
configs/karlo/*.yaml, configs/stable-diffusion/intel/*.yaml, configs/stable-diffusion/v2-1-stable-unclip-*.yaml
YAML configs for Karlo prior/decoder/SR and SD v2 inference variants (BF16/FP32, v-parameterization, unCLIP-L/H with image embedding and noise augmentation).
Documentation and project setup
doc/UNCLIP.MD, README.md, modelcard.md, requirements.txt, .gitignore, checkpoints/checkpoints.txt
unCLIP documentation with Diffusers and Karlo examples; README updates for v2.1 branding and IPEX/TorchScript docs; model card updates for v2.1 checkpoints; dependency and metadata updates.

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (21)
ldm/models/diffusion/plms.py-13-23 (1)

13-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default the sampler device from the model instead of hard-coding CUDA.

This still makes PLMSSampler(model) unsafe on CPU-only runs, because the first registered buffers are forced onto CUDA unless every caller now passes device. The default should resolve from the model and only override when the caller explicitly asks for a different device.

Suggested fix
-    def __init__(self, model, schedule="linear", device=torch.device("cuda"), **kwargs):
+    def __init__(self, model, schedule="linear", device=None, **kwargs):
         super().__init__()
         self.model = model
         self.ddpm_num_timesteps = model.num_timesteps
         self.schedule = schedule
-        self.device = device
+        self.device = model.betas.device if device is None else torch.device(device)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/models/diffusion/plms.py` around lines 13 - 23, The sampler currently
hard-codes device=torch.device("cuda") which forces buffers onto CUDA; change
__init__ to accept device=None and set self.device to the provided device or
infer it from the model (e.g., use model.device if available or the device of
the model's parameters via next(model.parameters()).device) so
PLMSSampler(model) defaults to the model's device; ensure register_buffer
continues to use self.device when moving tensors so buffers go to the inferred
device unless the caller explicitly overrides it.
ldm/models/diffusion/ddim.py-11-21 (1)

11-21: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default the sampler device from the model instead of hard-coding CUDA.

DDIMSampler(model) will still move schedule buffers to CUDA here, so any CPU-only call path that forgets to pass the new argument still fails at runtime. Default this to model.betas.device/model.device when device is omitted.

Suggested fix
-    def __init__(self, model, schedule="linear", device=torch.device("cuda"), **kwargs):
+    def __init__(self, model, schedule="linear", device=None, **kwargs):
         super().__init__()
         self.model = model
         self.ddpm_num_timesteps = model.num_timesteps
         self.schedule = schedule
-        self.device = device
+        self.device = model.betas.device if device is None else torch.device(device)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/models/diffusion/ddim.py` around lines 11 - 21, The DDIMSampler __init__
currently hardcodes device=torch.device("cuda"), causing register_buffer to move
tensors to CUDA even when model is CPU-only; change the default device to derive
from the model (e.g., use model.device if present, otherwise model.betas.device)
and ensure __init__ sets self.device = device or the derived model device before
any buffer registration so register_buffer uses that device; update references
in DDIMSampler.__init__ and confirm register_buffer logic remains the same but
now compares attr.device to the derived self.device.
ldm/models/diffusion/dpm_solver/sampler.py-13-23 (1)

13-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Default the sampler device from the model instead of hard-coding CUDA.

The new parameter is still effectively mandatory for CPU execution. If a caller instantiates DPMSolverSampler(model) without threading device through, register_buffer() will still materialize alphas_cumprod on CUDA and fail on CPU-only environments.

Suggested fix
-    def __init__(self, model, device=torch.device("cuda"), **kwargs):
+    def __init__(self, model, device=None, **kwargs):
         super().__init__()
         self.model = model
-        self.device = device
+        self.device = model.betas.device if device is None else torch.device(device)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/models/diffusion/dpm_solver/sampler.py` around lines 13 - 23, The sampler
currently hard-codes device=torch.device("cuda") which causes register_buffer
and to_torch to place tensors on CUDA even when the provided model or runtime is
CPU-only; change the initialization so the default device is derived from the
model (e.g., set device to model.device when no device is passed), ensure
self.device is assigned from that value, and update the to_torch lambda and
register_buffer behavior to consistently convert tensors to self.device (and not
assume CUDA) when calling self.register_buffer('alphas_cumprod', ...); this
ensures __init__, to_torch, and register_buffer all use the
model/device-consistent device.
ldm/modules/attention.py-173-179 (1)

173-179: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix fp32 attention autocast disabling to target the active device

In ldm/modules/attention.py (lines 173-179), the fp32 branch only does torch.autocast(enabled=False, device_type='cuda'), which disables CUDA autocast but leaves CPU autocast untouched. That means the einsum compute can still be influenced by CPU bf16 autocast even after q, k = q.float(), k.float(), undermining the overflow/precision guard. Disable autocast for q.device.type (or the device where the attention runs) instead of hard-coding 'cuda'.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/attention.py` around lines 173 - 179, The fp32 branch currently
disables autocast only for 'cuda', which misses CPU autocast; update the
torch.autocast context in ldm/modules/attention.py around the _ATTN_PRECISION ==
"fp32" branch to disable autocast for the actual device where q/k live (use
q.device.type or equivalent) so both casting and the einsum are executed without
autocast on the active device (affecting torch.autocast and the block that sets
q, k to float() and computes sim via einsum and self.scale).
ldm/modules/diffusionmodules/openaimodel.py-466-467 (1)

466-467: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

use_bf16 only changes activation dtype; module parameters/buffers are never converted (fp16/f32 converters are dummy no-ops).

  • UNetModel.__init__ sets self.dtype = th.bfloat16 when use_bf16=True, and UNetModel.forward immediately casts activations via h = x.type(self.dtype).
  • The existing dtype conversion helpers convert_module_to_f16 / convert_module_to_f32 are currently pass, so convert_to_fp16() / convert_to_fp32() can’t actually convert layer dtypes.
  • This means bf16 activations will run against fp32-initialized weights unless the caller uses autocast and/or converts the model externally, so the use_bf16 flag doesn’t reliably establish a bf16 UNet.

Fix by implementing a real bf16 conversion path for parameters/buffers (or by keeping internal activations fp32 and relying on caller-managed autocast).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/diffusionmodules/openaimodel.py` around lines 466 - 467,
UNetModel currently sets self.dtype and casts activations when use_bf16=True,
but convert_module_to_f16/convert_module_to_f32 are no-ops so parameters/buffers
remain fp32; update the conversion path so convert_module_to_f16 and
convert_module_to_f32 actually cast module parameters and buffers (and buffers
like running_mean/var) to torch.bfloat16/torch.float32 respectively (or use
module.to(dtype) where safe), then call these helpers from
convert_to_fp16()/convert_to_fp32 so UNetModel weights match activations when
use_bf16 is set (keep UNetModel.__init__/forward behavior intact); ensure you
reference and update functions convert_module_to_f16, convert_module_to_f32,
convert_to_fp16, convert_to_fp32, and UNetModel to perform the full
parameter/buffer dtype conversion.
scripts/streamlit/stableunclip.py-57-74 (1)

57-74: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the ADM image path until an upload exists.

load_img() assumes the uploader already returned an image, but make_conditionings_from_input() is called during page render. In the default path this hits image.size on None before the user can press Sample.

🛠️ Suggested guard
 def load_img(display=True, key=None):
     image = get_interactive_image(key=key)
+    if image is None:
+        return None
     if display:
         st.image(image)
     w, h = image.size
@@
 def get_init_img(batch_size=1, key=None):
-    init_image = load_img(key=key).cuda()
+    init_image = load_img(key=key)
+    if init_image is None:
+        return None
+    init_image = init_image.cuda()
     init_image = repeat(init_image, '1 ... -> b ...', b=batch_size)
     return init_image
@@
         def make_conditionings_from_input(num=1, key=None):
             init_img = get_init_img(batch_size=number_cols, key=key)
+            if init_img is None:
+                return None, None, None
             with torch.no_grad():
                 adm_cond = state["model"].embedder(init_img)
@@
         for n in range(num_inputs):
             adm_cond, adm_uc, w = make_conditionings_from_input(num=n + 1, key=n)
+            if adm_cond is None:
+                st.info("Upload all input images before sampling.")
+                break
             weights.append(w)
             adm_inputs.append(adm_cond)
-        adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights)
+        if adm_inputs:
+            adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights)

Also applies to: 338-360

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/streamlit/stableunclip.py` around lines 57 - 74, load_img currently
assumes get_interactive_image returned an image and will dereference image.size
causing a crash during initial render; update load_img to check for a missing
upload (image is None) and gracefully return None (or a neutral tensor) instead
of proceeding, and update get_init_img to detect a None return from load_img and
either return None or a zeroed/placeholder CUDA tensor so callers like
make_conditionings_from_input can handle "no image" without error; reference
functions load_img and get_init_img when making these changes.
ldm/modules/karlo/kakao/modules/resample.py-42-56 (1)

42-56: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return sampled timesteps on the requested device.

sample() accepts device, but Line 52 never applies it. If the sampler stays on CPU while the training step asks for CUDA timesteps, indices and weights come back on CPU and the next gather/model call fails on mixed devices. Move w or the outputs to device before returning them.

Suggested fix
     def sample(self, batch_size, device):
         """
         Importance-sample timesteps for a batch.
@@
-        w = self.weights()
+        w = self.weights().to(device=device)
         p = w / th.sum(w)
         indices = p.multinomial(batch_size, replacement=True)
         weights = 1 / (len(p) * p[indices])
         return indices, weights
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/modules/resample.py` around lines 42 - 56, The
sample() method currently computes w = self.weights(), normalizes p and draws
indices on the sampler's default device, but never moves the results to the
requested device; update sample() so that either w (from self.weights()) is
created/moved to the requested device or the outputs are moved before return:
ensure p is on device, compute indices on that device (or call indices =
indices.to(device)) and compute weights on device (or weights =
weights.to(device)) so the returned indices and weights are on the supplied
device; adjust the code references inside sample(), including self.weights(), p,
indices, and weights.
ldm/modules/karlo/kakao/modules/diffusion/gaussian_diffusion.py-423-455 (1)

423-455: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard model_kwargs before unpacking in the conditioning hooks.

Line 431 and Line 448 unpack model_kwargs unconditionally. With the current default model_kwargs=None, any guided sampling path that supplies cond_fn but no extra kwargs raises TypeError on the first step. Normalize it to {} in both helpers so the documented default stays valid.

Suggested fix
     def condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs=None):
+        if model_kwargs is None:
+            model_kwargs = {}
         """
         Compute the mean for the previous step, given a function cond_fn that
         computes the gradient of a conditional log probability with respect to
         x. In particular, cond_fn computes grad(log(p(y|x))), and we want to
         condition on y.
@@
     def condition_score(self, cond_fn, p_mean_var, x, t, model_kwargs=None):
+        if model_kwargs is None:
+            model_kwargs = {}
         """
         Compute what the p_mean_variance output would have been, should the
         model's score function be conditioned by cond_fn.
         See condition_mean() for details on cond_fn.
         Unlike condition_mean(), this instead uses the conditioning strategy
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/modules/diffusion/gaussian_diffusion.py` around lines
423 - 455, In both condition_mean and condition_score, guard against
model_kwargs being None by normalizing it to an empty dict before it's unpacked;
e.g., at the top of condition_mean and condition_score set model_kwargs =
model_kwargs or {} (or equivalent) so calls like cond_fn(x, t, **model_kwargs)
and any other unpacking won't raise TypeError when model_kwargs is not provided;
update the two functions (condition_mean, condition_score) to use this
normalized dict before calling cond_fn or passing model_kwargs to other helpers.
ldm/modules/karlo/kakao/modules/unet.py-746-758 (1)

746-758: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

clip_emb_drop is a no-op right now.

forward() never calls proc_clip_emb_drop(), so enabling classifier-free dropout does not change y at all. Also, cf_param is created with th.empty(...), which leaves garbage values if/when the drop path is wired up.

Proposed fix
         if self.clip_emb_drop > 0:
-            self.cf_param = nn.Parameter(th.empty(self.clip_dim, dtype=th.float32))
+            self.cf_param = nn.Parameter(th.zeros(self.clip_dim, dtype=th.float32))
@@
     def forward(
         self, x, timesteps, txt_feat=None, txt_feat_seq=None, mask=None, y=None
     ):
         bsz = x.shape[0]
         hs = []
+        y = self.proc_clip_emb_drop(y)
         emb = self.time_embed(timestep_embedding(timesteps, self.model_channels))
         emb = emb + self.clip_emb(y)

Also applies to: 761-780

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/modules/unet.py` around lines 746 - 758, The
classifier-free dropout is currently a no-op because forward() never calls
proc_clip_emb_drop(), and cf_param is uninitialized garbage from th.empty; fix
by calling proc_clip_emb_drop(y) (or the appropriate CLIP feature tensor named y
where conditioning is applied) inside forward() at the point classifier-free
dropout should be applied so the feature can be replaced when drop triggers, and
change cf_param creation in the constructor to a deterministic initializer
(e.g., zeros of shape self.clip_dim with correct dtype) so replacements are
well-defined; apply the same fixes for the duplicate implementation referenced
around proc_clip_emb_drop and its constructor initialization (cf_param,
clip_emb_drop, clip_dim).
ldm/modules/karlo/kakao/modules/unet.py-140-150 (1)

140-150: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

use_checkpoint is exposed but currently does nothing.

This flag is documented, stored, and passed into every block, but none of the forward paths actually call checkpointing. On models this large, callers can set use_checkpoint=True and still hit the same activation-memory ceiling/OOMs.

Also applies to: 334-346, 451-475

ldm/modules/karlo/kakao/modules/unet.py-681-689 (1)

681-689: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the positional-args path in SuperResUNetModel.

The positional branch doubles args[1], which is model_channels, not in_channels. Any positional instantiation will build the wrong network and still leave the input-channel contract broken.

Proposed fix
         else:
             # Curse you, Python. Or really, just curse positional arguments :|.
             args = list(args)
-            args[1] = args[1] * 2
+            args[0] = args[0] * 2
         super().__init__(*args, **kwargs)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/modules/unet.py` around lines 681 - 689, The
positional-args branch in SuperResUNetModel.__init__ currently multiplies
args[1] (which is model_channels) instead of the in_channels positional
argument; change the branch so it converts args to a list and multiplies the
element corresponding to the in_channels parameter (the positional index of
in_channels in the __init__ signature) by 2 before calling super().__init__;
i.e., replace args[1] = args[1] * 2 with code that updates
args[<index_of_in_channels>] = args[<index_of_in_channels>] * 2 so the actual
in_channels positional value is doubled.
ldm/modules/karlo/kakao/modules/xf.py-195-223 (1)

195-223: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The optional mask parameters are not actually optional.

mask=None fails at F.pad(...), and causal_mask=None fails when building the additive mask. Either construct defaults here or make both arguments required so callers do not hit an avoidable runtime error.

Proposed fix
     def forward(
         self,
         x,
         timesteps,
         text_emb=None,
         text_enc=None,
         mask=None,
         causal_mask=None,
     ):
         bsz = x.shape[0]
+        if mask is None or causal_mask is None:
+            raise ValueError("mask and causal_mask must be provided")
         mask = F.pad(mask, (0, self.ext_len), value=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/modules/xf.py` around lines 195 - 223, The forward
method treats mask and causal_mask as required but they can be None; ensure safe
defaults: if mask is None create a boolean tensor of True with shape (bsz,
text_enc.size(1) + 4) (text_enc tokens + text_emb + t_emb + x + prd_emb) on
x.device/dtype before calling F.pad(self.ext_len), and if causal_mask is None
create a zero additive mask (zeros on the same device/dtype) shaped to broadcast
with mask[:, None, :], e.g. (1, 1, input_seq_len_after_padding) or (1,
input_seq_len_after_padding) so the later expression (mask[:, None, :] +
causal_mask) works; add these checks near the top of forward (before F.pad and
the causal_mask addition) and reference symbols: forward, mask, causal_mask,
F.pad, self.ext_len, text_enc, prd_emb, and mask[:, None, :].
ldm/modules/karlo/diffusers_pipeline.py-126-143 (1)

126-143: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Truncate text_mask together with text_input_ids.

On long prompts, Lines 135-141 slice text_input_ids after Line 133 already built text_mask. That leaves the mask longer than last_hidden_state, so the prior/decoder attention path can break as soon as a prompt exceeds model_max_length.

Suggested fix
             text_input_ids = text_inputs.input_ids
-            text_mask = text_inputs.attention_mask.bool().to(device)

             if text_input_ids.shape[-1] > self.tokenizer.model_max_length:
                 removed_text = self.tokenizer.batch_decode(text_input_ids[:, self.tokenizer.model_max_length :])
                 logger.warning(
                     "The following part of your input was truncated because CLIP can only handle sequences up to"
                     f" {self.tokenizer.model_max_length} tokens: {removed_text}"
                 )
                 text_input_ids = text_input_ids[:, : self.tokenizer.model_max_length]

+            text_mask = text_inputs.attention_mask[:, : text_input_ids.shape[-1]].bool().to(device)
             text_encoder_output = self.text_encoder(text_input_ids.to(device))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/diffusers_pipeline.py` around lines 126 - 143, The code
slices text_input_ids when it exceeds self.tokenizer.model_max_length but leaves
text_mask unchanged, causing a length mismatch; update the truncation block so
after you set text_input_ids = text_input_ids[:, :
self.tokenizer.model_max_length] you also truncate text_mask to the same width
(e.g., text_mask = text_mask[:, :
self.tokenizer.model_max_length].bool().to(device) or preserve its current
dtype/device), ensuring text_mask and text_input_ids passed into text_encoder
and downstream attention share the same sequence length (refer to variables
text_input_ids, text_mask, tokenizer.model_max_length and the call
text_encoder(text_input_ids.to(device))).
ldm/modules/karlo/diffusers_pipeline.py-311-329 (1)

311-329: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate the prompt / text_model_output contract before deriving batch_size.

Line 319 indexes text_model_output unconditionally when prompt is None, so pipeline() currently crashes with a NoneType error instead of a clear validation error. The same block also accepts text_model_output without text_attention_mask, which then fails later in _encode_prompt(), and it lets prompt and text_model_output disagree on batch size because this method uses the prompt-derived size while _encode_prompt() uses the embedding-derived size.

Suggested fix
-        if prompt is not None:
+        if prompt is None:
+            if text_model_output is None or text_attention_mask is None:
+                raise ValueError(
+                    "Pass either `prompt` or both `text_model_output` and `text_attention_mask`."
+                )
+            batch_size = text_model_output[0].shape[0]
+        else:
             if isinstance(prompt, str):
                 batch_size = 1
             elif isinstance(prompt, list):
                 batch_size = len(prompt)
             else:
                 raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
-        else:
-            batch_size = text_model_output[0].shape[0]
+
+            if text_model_output is not None and text_model_output[0].shape[0] != batch_size:
+                raise ValueError("`prompt` and `text_model_output` must have the same batch size.")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/diffusers_pipeline.py` around lines 311 - 329, When prompt
is None, the code currently indexes text_model_output[0] and may accept
text_model_output without text_attention_mask, causing unclear crashes and
batch-size mismatches with _encode_prompt; update pipeline() to validate the
prompt/text_model_output contract: if prompt is None require text_model_output
and text_attention_mask to be non-None, derive batch_size from
text_model_output[0].shape[0] (not by indexing when None), and if both prompt
and text_model_output are supplied ensure their batch sizes agree (raise a clear
ValueError on mismatch); also when prompt is provided but text_model_output is
present, verify consistency before calling _encode_prompt(); reference symbols:
prompt, text_model_output, text_attention_mask, batch_size, _encode_prompt,
pipeline().
ldm/modules/encoders/modules.py-323-329 (1)

323-329: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

clip_stats_path=None only works for 256-d embeddings.

The fallback data_mean/data_std are sized from timestep_dim (default 256), but forward() applies them to the actual CLIP embedding width. Any encoder that does not emit 256 features will shape-mismatch here. Using scalar 0/1 defaults, or a separate embedding-dim parameter, keeps the no-stats path generic.

Suggested fix
         if clip_stats_path is None:
-            clip_mean, clip_std = torch.zeros(timestep_dim), torch.ones(timestep_dim)
+            clip_mean, clip_std = torch.zeros(1), torch.ones(1)
         else:
             clip_mean, clip_std = torch.load(clip_stats_path, map_location="cpu")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/encoders/modules.py` around lines 323 - 329, The fallback when
clip_stats_path is None assumes timestep_dim (256) which breaks when CLIP
embeddings have a different width; update the fallback in the constructor so
that data_mean and data_std are created to match the actual CLIP embedding width
used in forward() instead of hardcoding timestep_dim: either use scalar defaults
(torch.tensor(0.) / torch.tensor(1.)) registered as buffers so broadcasting
works, or accept/derive an explicit embedding_dim parameter and construct
clip_mean/clip_std of size embedding_dim before calling
self.register_buffer("data_mean", ...) and self.register_buffer("data_std",
...); ensure the symbols clip_stats_path, timestep_dim, data_mean, data_std and
forward() are consistent after the change.
ldm/modules/karlo/kakao/models/decoder_model.py-101-113 (1)

101-113: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the invalid default for timestep_respacing.

forward() defaults timestep_respacing to None, but get_sample_fn() unconditionally calls startswith() on it. That makes the default call path unusable. Either resolve None to the configured respacing or require the argument explicitly.

Also applies to: 115-124, 154-155

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/models/decoder_model.py` around lines 101 - 113, The
code calls timestep_respacing.startswith(...) in get_sample_fn but forward
defaults timestep_respacing to None, causing an AttributeError; change
get_sample_fn (and the similar places at the other startswith calls) to first
resolve a None input to the configured respacing (e.g. timestep_respacing =
timestep_respacing or self._diffusion_kwargs.get("timestep_respacing") or a
sensible default) before calling startswith, then proceed to call
create_gaussian_diffusion(**diffusion_kwargs) and select
diffusion.ddim_sample_loop_progressive vs diffusion.p_sample_loop_progressive;
alternatively, if you prefer explicitness, raise a clear ValueError when
timestep_respacing is None instead of resolving it.
ldm/modules/karlo/kakao/sampler.py-95-101 (1)

95-101: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the default progressive_mode valid.

Both __call__ methods default progressive_mode to None and then immediately reject None, so a plain sampler(prompt, bsz) call always raises. Default this to "final" or treat None as the final-only mode.

Also applies to: 242-248

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/sampler.py` around lines 95 - 101, The __call__
methods currently default progressive_mode to None and immediately assert it's
one of ("loop","stage","final"), causing plain calls to fail; change the default
to "final" or add a guard that treats None as "final" before the assert in the
__call__ implementations (refer to the __call__ at lines around the first
occurrence and the second __call__ around lines 242-248) so that
progressive_mode is valid when omitted; ensure you update both locations (the
sampler.__call__ methods) to either use progressive_mode: str = "final" or set
progressive_mode = "final" when None prior to the assert.
ldm/modules/karlo/kakao/models/prior_model.py-72-80 (1)

72-80: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle the default timestep_respacing path.

forward() exposes timestep_respacing=None, but get_sample_fn() immediately calls startswith() on it. A caller that relies on the default will hit AttributeError before sampling begins. Default this to self._diffusion_kwargs["timestep_respacing"] or reject None explicitly.

Also applies to: 90-98, 126-127

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/models/prior_model.py` around lines 72 - 80, The code
calls timestep_respacing.startswith(...) without guarding for None; update
get_sample_fn to first set timestep_respacing = timestep_respacing or
self._diffusion_kwargs.get("timestep_respacing") (or explicitly raise a clear
ValueError if None is unacceptable), then proceed to compute use_ddim and build
diffusion_kwargs/create_gaussian_diffusion and choose diffusion.ddim_sample_loop
vs diffusion.p_sample_loop; make the same fix in the other places that call
.startswith (e.g., the forward caller that exposes timestep_respacing=None and
any other methods around the create_gaussian_diffusion / diffusion sample
selection blocks) so None is handled consistently.
ldm/modules/karlo/kakao/template.py-87-92 (1)

87-92: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve the config YAML paths before loading them.

These loaders hand raw relative strings to OmegaConf.load(), and the defaults here are also missing the configs/karlo/ prefix used by the added files. That means the defaulted APIs are broken, and even explicit Karlo paths only work when the caller's current working directory happens to be the repo root.

Also applies to: 95-95, 114-114, 117-117, 130-130, 133-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/template.py` around lines 87 - 92, The loader passes
raw relative paths to OmegaConf.load which breaks unless CWD is repo root; in
load_prior resolve prior_config and clip_stat_path to absolute paths before
calling OmegaConf.load by joining them (when not absolute) to the package's
configs/karlo directory (use Path(__file__).resolve().parent / "configs" /
"karlo" or equivalent) and convert to str; apply the same path-resolution logic
to the other Karlo config/stat parameters referenced elsewhere in this module so
any relative default like "configs/prior_1B_vit_l.yaml" is converted to an
absolute path prior to OmegaConf.load or file reads.
ldm/modules/karlo/kakao/models/clip.py-37-39 (1)

37-39: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid deleting third-party CLIP.forward globally (process-wide mutation)

  • ldm/modules/karlo/kakao/models/clip.py removes clip.model.CLIP.forward at import time (delattr(...)), mutating the external CLIP class for the whole process.
  • CustomizedCLIP.forward(image, text) then calls super().forward(image, text) after that deletion and doesn’t return anything, so any eager use of clip(image, text) will fail/return None.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/models/clip.py` around lines 37 - 39, The code
currently deletes the external CLIP.forward via delattr(clip.model.CLIP,
"forward"), causing a process-wide mutation and breaking super() calls; remove
that delattr call and instead implement/use a proper subclass that overrides
forward: keep the LayerNorm assignment (clip.model.LayerNorm = LayerNorm),
ensure CustomizedCLIP inherits from clip.model.CLIP, implement
CustomizedCLIP.forward(self, image, text) to call result =
super().forward(image, text) and return that result (do not drop the return),
and instantiate or register that subclass locally where the model is created
rather than mutating clip.model.CLIP globally.
scripts/txt2img.py-180-185 (1)

180-185: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Disable CPU autocast by default when --device=cpu

--device now defaults to "cpu" while --precision defaults to "autocast", so the no-flag path enables torch.autocast("cpu") via precision_scope(opt.device) even when --bf16 is not set (CPU autocast defaults to bf16).

Suggested fix
-    precision_scope = autocast if opt.precision=="autocast" or opt.bf16 else nullcontext
+    use_cuda_autocast = opt.precision == "autocast" and opt.device == "cuda"
+    use_cpu_bf16 = opt.device == "cpu" and opt.bf16
+    precision_scope = autocast if use_cuda_autocast or use_cpu_bf16 else nullcontext
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/txt2img.py` around lines 180 - 185, The CLI defaults result in
torch.autocast("cpu") being enabled unintentionally; after parsing arguments,
check opt.device and opt.precision and disable CPU autocast when device=="cpu"
by changing the precision to a non-autocast mode (e.g., set
opt.precision="fp32") before calling precision_scope(opt.device); locate the
parser.add_argument block for "--device" and the code that uses
precision_scope(opt.device) and add a small guard like: if opt.device == "cpu"
and opt.precision == "autocast": opt.precision = "fp32" (optionally log a
warning) so CPU runs do not default into bf16/autocast.
🟡 Minor comments (11)
configs/stable-diffusion/intel/v2-inference-fp32.yaml-35-35 (1)

35-35: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address or remove the flash-attention TODO.

The comment # need to fix for flash-attn indicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the current num_head_channels: 64 value is correct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/stable-diffusion/intel/v2-inference-fp32.yaml` at line 35, The inline
TODO on num_head_channels is incomplete; either remove the comment or replace it
with a clear action. Update the YAML entry for num_head_channels (currently set
to 64) by either deleting the trailing comment "# need to fix for flash-attn" if
64 is confirmed correct, or replace it with a short note documenting the
required value and reasoning for flash-attn compatibility (e.g., required head
size or reference to matching num_heads), so future maintainers know
whether/what needs changing.
configs/karlo/improved_sr_64_256_1.4B.yaml-26-26 (1)

26-26: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify the # fix comment.

The inline comment # fix suggests uncertainty about the timestep_respacing value. If '7' is the correct value for 7-step sampling (as used in ImprovedSupRes64to256ProgressiveModel per the stack context), remove the comment. Otherwise, document what needs to be fixed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/karlo/improved_sr_64_256_1.4B.yaml` at line 26, The inline comment "#
fix" next to timestep_respacing is ambiguous; update the config to either remove
the comment if the value '7' is correct for 7-step sampling used by
ImprovedSupRes64to256ProgressiveModel, or replace the comment with a clear note
describing the intended spacing (e.g., "7-step sampling" or "matches model's
progressive schedule") so readers know whether timestep_respacing = '7' is
intentional or requires adjustment.
configs/stable-diffusion/intel/v2-inference-bf16.yaml-36-36 (1)

36-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address or remove the flash-attention TODO.

The comment # need to fix for flash-attn indicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the current num_head_channels: 64 value is correct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/stable-diffusion/intel/v2-inference-bf16.yaml` at line 36, The inline
TODO on num_head_channels is ambiguous; either remove the comment or replace it
with a precise note about flash-attn requirements. Update the line referencing
num_head_channels to either drop the "# need to fix for flash-attn" comment if
64 is confirmed correct, or replace it with a short, specific instruction
explaining what must change for flash‑attention (e.g., required per-head channel
size, alignment, or compatibility check) and reference the setting name
num_head_channels so future maintainers know what to verify when enabling
flash-attn.
configs/stable-diffusion/intel/v2-inference-v-bf16.yaml-37-37 (1)

37-37: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address or remove the flash-attention TODO.

The comment # need to fix for flash-attn indicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the current num_head_channels: 64 value is correct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/stable-diffusion/intel/v2-inference-v-bf16.yaml` at line 37, The
inline TODO comment about flash-attention on the num_head_channels setting is
incomplete; update or remove it: if 64 is verified to be correct for flash-attn,
remove the comment and add a short clarifying comment like "compatible with
flash-attn"; otherwise, document the exact change required (e.g., target head
channel size or calculation) and adjust num_head_channels accordingly. Locate
the num_head_channels entry in the config (the line containing
"num_head_channels: 64") and either delete the "# need to fix for flash-attn"
note or replace it with a clear, actionable comment and/or the corrected numeric
value so the config and comment are consistent.
configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml-45-45 (1)

45-45: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address or remove the flash-attention TODO.

The comment # need to fix for flash-attn indicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the current num_head_channels: 64 value is correct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml` at line 45, The
inline TODO on the num_head_channels setting is ambiguous; either remove the
comment or explain what to change for flash-attn compatibility. Update the YAML
entry for num_head_channels: 64 by either deleting the comment "# need to fix
for flash-attn" if 64 is confirmed correct, or replacing it with a short note
that specifies the required value or condition for flash-attn (e.g., expected
head dim, multiple-of constraint, or a link/reference to the calculation),
referencing the num_head_channels key so reviewers can verify it against the
flash-attn requirements.
configs/stable-diffusion/intel/v2-inference-v-fp32.yaml-36-36 (1)

36-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Address or remove the flash-attention TODO.

The comment # need to fix for flash-attn indicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the current num_head_channels: 64 value is correct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/stable-diffusion/intel/v2-inference-v-fp32.yaml` at line 36, The
inline TODO on num_head_channels should be resolved: either remove the comment
or replace it with a concrete note describing required changes for flash-attn
compatibility. Locate the num_head_channels: 64 entry and if 64 is correct,
delete the "# need to fix for flash-attn" comment; if not, update the comment to
state the exact required value or transformation (e.g., match head_dim or
num_heads expectations) and reference flash-attn constraints so future readers
know the exact fix needed.
doc/UNCLIP.MD-51-51 (1)

51-51: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammatical error.

The word "than" should be "that" in this context.

📝 Proposed fix
-to launch a streamlit script than can be used to make image variations with both models (CLIP-L and OpenCLIP-H).
+to launch a streamlit script that can be used to make image variations with both models (CLIP-L and OpenCLIP-H).

As per static analysis, this matches the LanguageTool grammar check finding.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@doc/UNCLIP.MD` at line 51, Replace the incorrect word "than" with "that" in
the sentence containing "to launch a streamlit script than can be used to make
image variations with both models (CLIP-L and OpenCLIP-H)" in UNCLIP.MD so the
phrase reads "to launch a streamlit script that can be used..." to fix the
grammatical error.
requirements.txt-16-16 (1)

16-16: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update Kornia dependency pin to the latest stable release.

kornia==0.6 is outdated (latest stable is kornia 0.8.2, released 2025-11-08). Upgrade the pin and verify your code/tests against any API changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` at line 16, Update the Kornia dependency pin in
requirements.txt from "kornia==0.6" to the latest stable "kornia==0.8.2"
(replace the existing "kornia==0.6" entry), then run the test suite and any
image-processing integration tests to catch API changes; if failures occur,
adjust usages of Kornia APIs referenced in your codebase (search for
imports/usages of "kornia" and functions/classes that may have changed) to the
new 0.8.2 API.
scripts/streamlit/stableunclip.py-342-360 (1)

342-360: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject zero-sum input weights before normalizing.

The sliders allow negative weights, but Line 360 divides by sum(weights) without checking for 0. A simple +1 / -1 mix yields inf/NaN conditioning and unstable sampling.

🧮 Suggested fix
-        adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights)
+        total_weight = sum(weights)
+        if total_weight == 0:
+            st.error("Input weights must not sum to 0.")
+            st.stop()
+        adm_cond = torch.stack(adm_inputs).sum(0) / total_weight
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/streamlit/stableunclip.py` around lines 342 - 360, The code computes
adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights) but doesn’t guard
against sum(weights) == 0; update the logic after building weights/adm_inputs
(around the adm_inputs/weights aggregation and make_conditionings_from_input
usage) to detect total_weight = sum(weights) == 0 and reject/handle it: e.g.,
show an error/message in the Streamlit UI and prevent normalization/sampling
until weights are non‑zero, or fallback to a safe default normalization (like
using 1.0) so you never divide by zero; ensure the check references the weights
list and the adm_inputs aggregation (adm_inputs, weights, adm_cond) so the UI
prevents or corrects zero-sum inputs before calling torch.stack(...)/division.
scripts/streamlit/stableunclip.py-245-249 (1)

245-249: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Initialize msg as a string before appending metadata.

msg starts as None, but Line 249 appends to it whenever model_ema.num_updates exists. Checkpoints without global_step will crash here even though that metadata is optional.

🧾 Suggested fix
-    msg = None
+    msg = ""
     if "global_step" in pl_sd:
         msg = f"This is global step {pl_sd['global_step']}. "
     if "model_ema.num_updates" in pl_sd["state_dict"]:
         msg += f"And we got {pl_sd['state_dict']['model_ema.num_updates']} EMA updates."
@@
-    return model, msg
+    return model, (msg or None)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/streamlit/stableunclip.py` around lines 245 - 249, The bug is that
msg is initialized to None and later appended to when checking
pl_sd["state_dict"]["model_ema.num_updates"], which crashes if "global_step" is
absent; change initialization so msg is a string (e.g., msg = "" or set msg = ""
when creating it) or guard the append by converting None to "" before using +=;
update the variables in this block (msg, pl_sd, pl_sd["state_dict"],
"model_ema.num_updates") so msg is always a string before any concatenation.
scripts/txt2img.py-218-219 (1)

218-219: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail fast when --device cuda is unavailable.

torch.device("cuda") doesn’t check runtime CUDA availability; it just creates a CUDA device handle, so failures get deferred until model/sampling. Add an early torch.cuda.is_available() guard when --device cuda is requested.

Suggested fix
     config = OmegaConf.load(f"{opt.config}")
     device = torch.device("cuda") if opt.device == "cuda" else torch.device("cpu")
+    if device.type == "cuda" and not torch.cuda.is_available():
+        raise ValueError("--device cuda was requested, but CUDA is not available")
     model = load_model_from_config(config, f"{opt.ckpt}", device)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/txt2img.py` around lines 218 - 219, When opt.device == "cuda" the
code currently constructs torch.device("cuda") without verifying CUDA is
actually available; add an explicit check using torch.cuda.is_available() before
creating device and loading the model (affecting the device variable and the
load_model_from_config call) and fail fast with a clear error/log and exit if
CUDA is not available so load_model_from_config(...) is never called on a
non-functional CUDA device.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a40fc073-b5f1-407c-a83f-fa7d024a3bfa

📥 Commits

Reviewing files that changed from the base of the PR and between 47b6b60 and cf1d67a.

⛔ Files ignored due to path filters (8)
  • assets/stable-samples/stable-unclip/houses_out.jpeg is excluded by !**/*.jpeg
  • assets/stable-samples/stable-unclip/oldcar000.jpeg is excluded by !**/*.jpeg
  • assets/stable-samples/stable-unclip/oldcar500.jpeg is excluded by !**/*.jpeg
  • assets/stable-samples/stable-unclip/oldcar800.jpeg is excluded by !**/*.jpeg
  • assets/stable-samples/stable-unclip/panda.jpg is excluded by !**/*.jpg
  • assets/stable-samples/stable-unclip/plates_out.jpeg is excluded by !**/*.jpeg
  • assets/stable-samples/stable-unclip/unclip-variations.png is excluded by !**/*.png
  • assets/stable-samples/stable-unclip/unclip-variations_noise.png is excluded by !**/*.png
📒 Files selected for processing (45)
  • .gitignore
  • README.md
  • checkpoints/checkpoints.txt
  • configs/karlo/decoder_900M_vit_l.yaml
  • configs/karlo/improved_sr_64_256_1.4B.yaml
  • configs/karlo/prior_1B_vit_l.yaml
  • configs/stable-diffusion/intel/v2-inference-bf16.yaml
  • configs/stable-diffusion/intel/v2-inference-fp32.yaml
  • configs/stable-diffusion/intel/v2-inference-v-bf16.yaml
  • configs/stable-diffusion/intel/v2-inference-v-fp32.yaml
  • configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml
  • configs/stable-diffusion/v2-1-stable-unclip-l-inference.yaml
  • doc/UNCLIP.MD
  • ldm/models/diffusion/ddim.py
  • ldm/models/diffusion/ddpm.py
  • ldm/models/diffusion/dpm_solver/dpm_solver.py
  • ldm/models/diffusion/dpm_solver/sampler.py
  • ldm/models/diffusion/plms.py
  • ldm/modules/attention.py
  • ldm/modules/diffusionmodules/openaimodel.py
  • ldm/modules/diffusionmodules/util.py
  • ldm/modules/encoders/modules.py
  • ldm/modules/karlo/__init__.py
  • ldm/modules/karlo/diffusers_pipeline.py
  • ldm/modules/karlo/kakao/__init__.py
  • ldm/modules/karlo/kakao/models/__init__.py
  • ldm/modules/karlo/kakao/models/clip.py
  • ldm/modules/karlo/kakao/models/decoder_model.py
  • ldm/modules/karlo/kakao/models/prior_model.py
  • ldm/modules/karlo/kakao/models/sr_256_1k.py
  • ldm/modules/karlo/kakao/models/sr_64_256.py
  • ldm/modules/karlo/kakao/modules/__init__.py
  • ldm/modules/karlo/kakao/modules/diffusion/gaussian_diffusion.py
  • ldm/modules/karlo/kakao/modules/diffusion/respace.py
  • ldm/modules/karlo/kakao/modules/nn.py
  • ldm/modules/karlo/kakao/modules/resample.py
  • ldm/modules/karlo/kakao/modules/unet.py
  • ldm/modules/karlo/kakao/modules/xf.py
  • ldm/modules/karlo/kakao/sampler.py
  • ldm/modules/karlo/kakao/template.py
  • ldm/util.py
  • modelcard.md
  • requirements.txt
  • scripts/streamlit/stableunclip.py
  • scripts/txt2img.py

image_size: 64
num_channels: 320
num_res_blocks: 3
channel_mult: ''

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

channel_mult should be a list, not an empty string.

The channel_mult parameter is set to an empty string, but UNet architectures expect a list of integer multipliers (e.g., [1, 2, 4, 4] as seen in the Stable Diffusion configs). This will cause a runtime error when the model attempts to iterate over channel multipliers during initialization.

🔧 Proposed fix
-    channel_mult: ''
+    channel_mult: [1, 2, 4, 4]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
channel_mult: ''
channel_mult: [1, 2, 4, 4]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/karlo/decoder_900M_vit_l.yaml` at line 8, The channel_mult parameter
is currently an empty string which will break UNet initialization; update
channel_mult to be a list of integer multipliers (for example, 1,2,4,4) instead
of an empty string so the model can iterate over channel multipliers during
setup; locate the channel_mult entry in the YAML and replace the empty-string
value with a list of integers appropriate for the UNet configuration.

Comment on lines +1813 to +1819
def _init_embedder(self, config, freeze=True):
embedder = instantiate_from_config(config)
if freeze:
self.embedder = embedder.eval()
self.embedder.train = disabled_train
for param in self.embedder.parameters():
param.requires_grad = False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

freeze_embedder=False is broken.

self.embedder is only assigned inside the frozen branch, so disabling freeze_embedder makes the first self.embedder(img) call fail with AttributeError. If this flag is intended to support fine-tuning, the embedder also needs to stay registered in the non-frozen path.

Suggested fix
     def _init_embedder(self, config, freeze=True):
-        embedder = instantiate_from_config(config)
+        self.embedder = instantiate_from_config(config)
         if freeze:
-            self.embedder = embedder.eval()
+            self.embedder = self.embedder.eval()
             self.embedder.train = disabled_train
             for param in self.embedder.parameters():
                 param.requires_grad = False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/models/diffusion/ddpm.py` around lines 1813 - 1819, The _init_embedder
function only assigns self.embedder inside the freeze branch, so when
freeze=False self.embedder is never registered and calls like self.embedder(img)
raise AttributeError; fix by assigning self.embedder = embedder before the if
freeze: block (so the module is always registered), and inside the if freeze:
branch call self.embedder = embedder.eval(), set self.embedder.train =
disabled_train and set param.requires_grad = False; when freeze is False, do not
override self.embedder.train or disable gradients so fine-tuning works (leave
parameters' requires_grad untouched). Ensure you reference _init_embedder,
instantiate_from_config, disabled_train, and self.embedder when making the
change.

Comment on lines +1835 to +1837
img = batch[self.embed_key][:bs]
img = rearrange(img, 'b h w c -> b c h w')
c_adm = self.embedder(img)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Move the embedding image onto self.device before encoding.

LatentDiffusion.get_input() moves the VAE/conditioning tensors to the module device, but this branch leaves img on the dataloader device. On GPU/XPU runs, self.embedder(img) will fail with a device mismatch.

Suggested fix
         img = batch[self.embed_key][:bs]
         img = rearrange(img, 'b h w c -> b c h w')
+        img = img.to(self.device, memory_format=torch.contiguous_format).float()
         c_adm = self.embedder(img)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
img = batch[self.embed_key][:bs]
img = rearrange(img, 'b h w c -> b c h w')
c_adm = self.embedder(img)
img = batch[self.embed_key][:bs]
img = rearrange(img, 'b h w c -> b c h w')
img = img.to(self.device, memory_format=torch.contiguous_format).float()
c_adm = self.embedder(img)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/models/diffusion/ddpm.py` around lines 1835 - 1837, The img tensor taken
from batch[self.embed_key] is not moved to the model device before calling
self.embedder, causing device-mismatch errors; before calling self.embedder(img)
(in the block using img and c_adm), move img to self.device (e.g., img =
img.to(self.device)) after rearrange so the embedding runs on the same device as
the module (referencing img, self.embed_key, self.embedder, and
LatentDiffusion.get_input for consistency).

Comment on lines +6 to +10
from ldm.modules.karlo.kakao.models.sr_64_256 import SupRes64to256Progressive


class SupRes256to1kProgressive(SupRes64to256Progressive):
pass # no difference currently

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix the broken base-class import.

sr_64_256.py does not define SupRes64to256Progressive, so importing this module will fail immediately. The subclass needs to inherit from the class that actually exists.

Suggested fix
-from ldm.modules.karlo.kakao.models.sr_64_256 import SupRes64to256Progressive
+from ldm.modules.karlo.kakao.models.sr_64_256 import (
+    ImprovedSupRes64to256ProgressiveModel,
+)
 
 
-class SupRes256to1kProgressive(SupRes64to256Progressive):
+class SupRes256to1kProgressive(ImprovedSupRes64to256ProgressiveModel):
     pass  # no difference currently
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from ldm.modules.karlo.kakao.models.sr_64_256 import SupRes64to256Progressive
class SupRes256to1kProgressive(SupRes64to256Progressive):
pass # no difference currently
from ldm.modules.karlo.kakao.models.sr_64_256 import (
ImprovedSupRes64to256ProgressiveModel,
)
class SupRes256to1kProgressive(ImprovedSupRes64to256ProgressiveModel):
pass # no difference currently
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ldm/modules/karlo/kakao/models/sr_256_1k.py` around lines 6 - 10, The file
imports and subclasses SupRes64to256Progressive which does not exist in
sr_64_256.py; update the import and inheritance so SupRes256to1kProgressive
extends the actual class defined in sr_64_256.py (replace
SupRes64to256Progressive with the real class name exported from sr_64_256.py) —
modify the import statement and the class declaration for
SupRes256to1kProgressive to reference that existing symbol.

Christoballo pushed a commit to Christoballo/stablediffusion that referenced this pull request May 30, 2026
…nchises

Researched what 433 (77M followers) and B/R Football do and codified it:
- franchises.ts: recurring named series with fixed visual cue + predictable
  cadence ('build a show, not a feed') — the top accounts' nlile#1 structural lever.
  Match-day-aware (react fast like 433: Predicted XI -> Matchday Verdict ->
  Stat Bomb) plus a weekly appointment rhythm for non-match days.
- Each episode carries a recognizable caption header, a signature interactive
  CTA (save/send/comment mechanic), and a recurring on-image cue.
- captions.ts: '433-style locker-room' voice for the LLM path; franchise header
  + series CTA in both LLM and template paths.
- planner/visuals/report wired to franchises; new 'wm-hub shows' command.
- STRATEGY.md documents the franchise lineup and the rationale.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.