Add runtime requirements, gitignore, and fix typos#1
Conversation
Tested with the following:
python -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements.txt
on Ubuntu 22 with Python 3.8, or similar
Paramterization -> Parameterization
Add runtime requirements
Add standard python gitignore with additions for IDEs and MacOS
Fix typo in ddpm.py
* Force cast to fp32 to avoid atten layer overflow
README and ModelCard for 2.1
Remove slash from end of assets/stable-samples/depth2img/d2i.gif
Fix image link
Typo found , fixed in README
Bumps [gradio](https://github.com/gradio-app/gradio) from 3.11 to 3.13.2. - [Release notes](https://github.com/gradio-app/gradio/releases) - [Changelog](https://github.com/gradio-app/gradio/blob/main/CHANGELOG.md) - [Commits](gradio-app/gradio@v3.11.0...v3.13.2) --- updated-dependencies: - dependency-name: gradio dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
Bump gradio from 3.11 to 3.13.2
Stable Diffusion 2.1 unCLIP Release
Add unCLIP finetunes
Add diffusers integration to Stable UnCLIP docs
Fix diffusers code snippet
📝 WalkthroughWalkthroughThis PR integrates Stable Diffusion v2.1 with unCLIP image conditioning and Kakao's Karlo text-to-image pipeline, adds device-configurable samplers, introduces cross-attention precision control, implements complete diffusion infrastructure, provides Streamlit UI for interactive image generation, and updates configurations/documentation. ChangesDevice Parametrization and Cross-Attention Precision
Image Embedding Conditioning for Stable unCLIP
Karlo Diffusion Infrastructure
Karlo Model Pipeline
Diffusers UnCLIPPipeline Integration
Streamlit UI and Script Optimization
Configuration Files and Documentation
🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (21)
ldm/models/diffusion/plms.py-13-23 (1)
13-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDefault the sampler device from the model instead of hard-coding CUDA.
This still makes
PLMSSampler(model)unsafe on CPU-only runs, because the first registered buffers are forced onto CUDA unless every caller now passesdevice. The default should resolve from the model and only override when the caller explicitly asks for a different device.Suggested fix
- def __init__(self, model, schedule="linear", device=torch.device("cuda"), **kwargs): + def __init__(self, model, schedule="linear", device=None, **kwargs): super().__init__() self.model = model self.ddpm_num_timesteps = model.num_timesteps self.schedule = schedule - self.device = device + self.device = model.betas.device if device is None else torch.device(device)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/models/diffusion/plms.py` around lines 13 - 23, The sampler currently hard-codes device=torch.device("cuda") which forces buffers onto CUDA; change __init__ to accept device=None and set self.device to the provided device or infer it from the model (e.g., use model.device if available or the device of the model's parameters via next(model.parameters()).device) so PLMSSampler(model) defaults to the model's device; ensure register_buffer continues to use self.device when moving tensors so buffers go to the inferred device unless the caller explicitly overrides it.ldm/models/diffusion/ddim.py-11-21 (1)
11-21:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDefault the sampler device from the model instead of hard-coding CUDA.
DDIMSampler(model)will still move schedule buffers to CUDA here, so any CPU-only call path that forgets to pass the new argument still fails at runtime. Default this tomodel.betas.device/model.devicewhendeviceis omitted.Suggested fix
- def __init__(self, model, schedule="linear", device=torch.device("cuda"), **kwargs): + def __init__(self, model, schedule="linear", device=None, **kwargs): super().__init__() self.model = model self.ddpm_num_timesteps = model.num_timesteps self.schedule = schedule - self.device = device + self.device = model.betas.device if device is None else torch.device(device)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/models/diffusion/ddim.py` around lines 11 - 21, The DDIMSampler __init__ currently hardcodes device=torch.device("cuda"), causing register_buffer to move tensors to CUDA even when model is CPU-only; change the default device to derive from the model (e.g., use model.device if present, otherwise model.betas.device) and ensure __init__ sets self.device = device or the derived model device before any buffer registration so register_buffer uses that device; update references in DDIMSampler.__init__ and confirm register_buffer logic remains the same but now compares attr.device to the derived self.device.ldm/models/diffusion/dpm_solver/sampler.py-13-23 (1)
13-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDefault the sampler device from the model instead of hard-coding CUDA.
The new parameter is still effectively mandatory for CPU execution. If a caller instantiates
DPMSolverSampler(model)without threadingdevicethrough,register_buffer()will still materializealphas_cumprodon CUDA and fail on CPU-only environments.Suggested fix
- def __init__(self, model, device=torch.device("cuda"), **kwargs): + def __init__(self, model, device=None, **kwargs): super().__init__() self.model = model - self.device = device + self.device = model.betas.device if device is None else torch.device(device)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/models/diffusion/dpm_solver/sampler.py` around lines 13 - 23, The sampler currently hard-codes device=torch.device("cuda") which causes register_buffer and to_torch to place tensors on CUDA even when the provided model or runtime is CPU-only; change the initialization so the default device is derived from the model (e.g., set device to model.device when no device is passed), ensure self.device is assigned from that value, and update the to_torch lambda and register_buffer behavior to consistently convert tensors to self.device (and not assume CUDA) when calling self.register_buffer('alphas_cumprod', ...); this ensures __init__, to_torch, and register_buffer all use the model/device-consistent device.ldm/modules/attention.py-173-179 (1)
173-179:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix fp32 attention autocast disabling to target the active device
In
ldm/modules/attention.py(lines 173-179), the fp32 branch only doestorch.autocast(enabled=False, device_type='cuda'), which disables CUDA autocast but leaves CPU autocast untouched. That means theeinsumcompute can still be influenced by CPU bf16 autocast even afterq, k = q.float(), k.float(), undermining the overflow/precision guard. Disable autocast forq.device.type(or the device where the attention runs) instead of hard-coding'cuda'.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/attention.py` around lines 173 - 179, The fp32 branch currently disables autocast only for 'cuda', which misses CPU autocast; update the torch.autocast context in ldm/modules/attention.py around the _ATTN_PRECISION == "fp32" branch to disable autocast for the actual device where q/k live (use q.device.type or equivalent) so both casting and the einsum are executed without autocast on the active device (affecting torch.autocast and the block that sets q, k to float() and computes sim via einsum and self.scale).ldm/modules/diffusionmodules/openaimodel.py-466-467 (1)
466-467:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
use_bf16only changes activation dtype; module parameters/buffers are never converted (fp16/f32 converters are dummy no-ops).
UNetModel.__init__setsself.dtype = th.bfloat16whenuse_bf16=True, andUNetModel.forwardimmediately casts activations viah = x.type(self.dtype).- The existing dtype conversion helpers
convert_module_to_f16/convert_module_to_f32are currentlypass, soconvert_to_fp16()/convert_to_fp32()can’t actually convert layer dtypes.- This means bf16 activations will run against fp32-initialized weights unless the caller uses autocast and/or converts the model externally, so the
use_bf16flag doesn’t reliably establish a bf16 UNet.Fix by implementing a real bf16 conversion path for parameters/buffers (or by keeping internal activations fp32 and relying on caller-managed autocast).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/diffusionmodules/openaimodel.py` around lines 466 - 467, UNetModel currently sets self.dtype and casts activations when use_bf16=True, but convert_module_to_f16/convert_module_to_f32 are no-ops so parameters/buffers remain fp32; update the conversion path so convert_module_to_f16 and convert_module_to_f32 actually cast module parameters and buffers (and buffers like running_mean/var) to torch.bfloat16/torch.float32 respectively (or use module.to(dtype) where safe), then call these helpers from convert_to_fp16()/convert_to_fp32 so UNetModel weights match activations when use_bf16 is set (keep UNetModel.__init__/forward behavior intact); ensure you reference and update functions convert_module_to_f16, convert_module_to_f32, convert_to_fp16, convert_to_fp32, and UNetModel to perform the full parameter/buffer dtype conversion.scripts/streamlit/stableunclip.py-57-74 (1)
57-74:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard the ADM image path until an upload exists.
load_img()assumes the uploader already returned an image, butmake_conditionings_from_input()is called during page render. In the default path this hitsimage.sizeonNonebefore the user can press Sample.🛠️ Suggested guard
def load_img(display=True, key=None): image = get_interactive_image(key=key) + if image is None: + return None if display: st.image(image) w, h = image.size @@ def get_init_img(batch_size=1, key=None): - init_image = load_img(key=key).cuda() + init_image = load_img(key=key) + if init_image is None: + return None + init_image = init_image.cuda() init_image = repeat(init_image, '1 ... -> b ...', b=batch_size) return init_image @@ def make_conditionings_from_input(num=1, key=None): init_img = get_init_img(batch_size=number_cols, key=key) + if init_img is None: + return None, None, None with torch.no_grad(): adm_cond = state["model"].embedder(init_img) @@ for n in range(num_inputs): adm_cond, adm_uc, w = make_conditionings_from_input(num=n + 1, key=n) + if adm_cond is None: + st.info("Upload all input images before sampling.") + break weights.append(w) adm_inputs.append(adm_cond) - adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights) + if adm_inputs: + adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights)Also applies to: 338-360
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/streamlit/stableunclip.py` around lines 57 - 74, load_img currently assumes get_interactive_image returned an image and will dereference image.size causing a crash during initial render; update load_img to check for a missing upload (image is None) and gracefully return None (or a neutral tensor) instead of proceeding, and update get_init_img to detect a None return from load_img and either return None or a zeroed/placeholder CUDA tensor so callers like make_conditionings_from_input can handle "no image" without error; reference functions load_img and get_init_img when making these changes.ldm/modules/karlo/kakao/modules/resample.py-42-56 (1)
42-56:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn sampled timesteps on the requested device.
sample()acceptsdevice, but Line 52 never applies it. If the sampler stays on CPU while the training step asks for CUDA timesteps,indicesandweightscome back on CPU and the next gather/model call fails on mixed devices. Movewor the outputs todevicebefore returning them.Suggested fix
def sample(self, batch_size, device): """ Importance-sample timesteps for a batch. @@ - w = self.weights() + w = self.weights().to(device=device) p = w / th.sum(w) indices = p.multinomial(batch_size, replacement=True) weights = 1 / (len(p) * p[indices]) return indices, weights🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/modules/resample.py` around lines 42 - 56, The sample() method currently computes w = self.weights(), normalizes p and draws indices on the sampler's default device, but never moves the results to the requested device; update sample() so that either w (from self.weights()) is created/moved to the requested device or the outputs are moved before return: ensure p is on device, compute indices on that device (or call indices = indices.to(device)) and compute weights on device (or weights = weights.to(device)) so the returned indices and weights are on the supplied device; adjust the code references inside sample(), including self.weights(), p, indices, and weights.ldm/modules/karlo/kakao/modules/diffusion/gaussian_diffusion.py-423-455 (1)
423-455:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
model_kwargsbefore unpacking in the conditioning hooks.Line 431 and Line 448 unpack
model_kwargsunconditionally. With the current defaultmodel_kwargs=None, any guided sampling path that suppliescond_fnbut no extra kwargs raisesTypeErroron the first step. Normalize it to{}in both helpers so the documented default stays valid.Suggested fix
def condition_mean(self, cond_fn, p_mean_var, x, t, model_kwargs=None): + if model_kwargs is None: + model_kwargs = {} """ Compute the mean for the previous step, given a function cond_fn that computes the gradient of a conditional log probability with respect to x. In particular, cond_fn computes grad(log(p(y|x))), and we want to condition on y. @@ def condition_score(self, cond_fn, p_mean_var, x, t, model_kwargs=None): + if model_kwargs is None: + model_kwargs = {} """ Compute what the p_mean_variance output would have been, should the model's score function be conditioned by cond_fn. See condition_mean() for details on cond_fn. Unlike condition_mean(), this instead uses the conditioning strategy🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/modules/diffusion/gaussian_diffusion.py` around lines 423 - 455, In both condition_mean and condition_score, guard against model_kwargs being None by normalizing it to an empty dict before it's unpacked; e.g., at the top of condition_mean and condition_score set model_kwargs = model_kwargs or {} (or equivalent) so calls like cond_fn(x, t, **model_kwargs) and any other unpacking won't raise TypeError when model_kwargs is not provided; update the two functions (condition_mean, condition_score) to use this normalized dict before calling cond_fn or passing model_kwargs to other helpers.ldm/modules/karlo/kakao/modules/unet.py-746-758 (1)
746-758:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
clip_emb_dropis a no-op right now.
forward()never callsproc_clip_emb_drop(), so enabling classifier-free dropout does not changeyat all. Also,cf_paramis created withth.empty(...), which leaves garbage values if/when the drop path is wired up.Proposed fix
if self.clip_emb_drop > 0: - self.cf_param = nn.Parameter(th.empty(self.clip_dim, dtype=th.float32)) + self.cf_param = nn.Parameter(th.zeros(self.clip_dim, dtype=th.float32)) @@ def forward( self, x, timesteps, txt_feat=None, txt_feat_seq=None, mask=None, y=None ): bsz = x.shape[0] hs = [] + y = self.proc_clip_emb_drop(y) emb = self.time_embed(timestep_embedding(timesteps, self.model_channels)) emb = emb + self.clip_emb(y)Also applies to: 761-780
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/modules/unet.py` around lines 746 - 758, The classifier-free dropout is currently a no-op because forward() never calls proc_clip_emb_drop(), and cf_param is uninitialized garbage from th.empty; fix by calling proc_clip_emb_drop(y) (or the appropriate CLIP feature tensor named y where conditioning is applied) inside forward() at the point classifier-free dropout should be applied so the feature can be replaced when drop triggers, and change cf_param creation in the constructor to a deterministic initializer (e.g., zeros of shape self.clip_dim with correct dtype) so replacements are well-defined; apply the same fixes for the duplicate implementation referenced around proc_clip_emb_drop and its constructor initialization (cf_param, clip_emb_drop, clip_dim).ldm/modules/karlo/kakao/modules/unet.py-140-150 (1)
140-150:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
use_checkpointis exposed but currently does nothing.This flag is documented, stored, and passed into every block, but none of the forward paths actually call checkpointing. On models this large, callers can set
use_checkpoint=Trueand still hit the same activation-memory ceiling/OOMs.Also applies to: 334-346, 451-475
ldm/modules/karlo/kakao/modules/unet.py-681-689 (1)
681-689:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the positional-args path in
SuperResUNetModel.The positional branch doubles
args[1], which ismodel_channels, notin_channels. Any positional instantiation will build the wrong network and still leave the input-channel contract broken.Proposed fix
else: # Curse you, Python. Or really, just curse positional arguments :|. args = list(args) - args[1] = args[1] * 2 + args[0] = args[0] * 2 super().__init__(*args, **kwargs)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/modules/unet.py` around lines 681 - 689, The positional-args branch in SuperResUNetModel.__init__ currently multiplies args[1] (which is model_channels) instead of the in_channels positional argument; change the branch so it converts args to a list and multiplies the element corresponding to the in_channels parameter (the positional index of in_channels in the __init__ signature) by 2 before calling super().__init__; i.e., replace args[1] = args[1] * 2 with code that updates args[<index_of_in_channels>] = args[<index_of_in_channels>] * 2 so the actual in_channels positional value is doubled.ldm/modules/karlo/kakao/modules/xf.py-195-223 (1)
195-223:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThe optional mask parameters are not actually optional.
mask=Nonefails atF.pad(...), andcausal_mask=Nonefails when building the additive mask. Either construct defaults here or make both arguments required so callers do not hit an avoidable runtime error.Proposed fix
def forward( self, x, timesteps, text_emb=None, text_enc=None, mask=None, causal_mask=None, ): bsz = x.shape[0] + if mask is None or causal_mask is None: + raise ValueError("mask and causal_mask must be provided") mask = F.pad(mask, (0, self.ext_len), value=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/modules/xf.py` around lines 195 - 223, The forward method treats mask and causal_mask as required but they can be None; ensure safe defaults: if mask is None create a boolean tensor of True with shape (bsz, text_enc.size(1) + 4) (text_enc tokens + text_emb + t_emb + x + prd_emb) on x.device/dtype before calling F.pad(self.ext_len), and if causal_mask is None create a zero additive mask (zeros on the same device/dtype) shaped to broadcast with mask[:, None, :], e.g. (1, 1, input_seq_len_after_padding) or (1, input_seq_len_after_padding) so the later expression (mask[:, None, :] + causal_mask) works; add these checks near the top of forward (before F.pad and the causal_mask addition) and reference symbols: forward, mask, causal_mask, F.pad, self.ext_len, text_enc, prd_emb, and mask[:, None, :].ldm/modules/karlo/diffusers_pipeline.py-126-143 (1)
126-143:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTruncate
text_masktogether withtext_input_ids.On long prompts, Lines 135-141 slice
text_input_idsafter Line 133 already builttext_mask. That leaves the mask longer thanlast_hidden_state, so the prior/decoder attention path can break as soon as a prompt exceedsmodel_max_length.Suggested fix
text_input_ids = text_inputs.input_ids - text_mask = text_inputs.attention_mask.bool().to(device) if text_input_ids.shape[-1] > self.tokenizer.model_max_length: removed_text = self.tokenizer.batch_decode(text_input_ids[:, self.tokenizer.model_max_length :]) logger.warning( "The following part of your input was truncated because CLIP can only handle sequences up to" f" {self.tokenizer.model_max_length} tokens: {removed_text}" ) text_input_ids = text_input_ids[:, : self.tokenizer.model_max_length] + text_mask = text_inputs.attention_mask[:, : text_input_ids.shape[-1]].bool().to(device) text_encoder_output = self.text_encoder(text_input_ids.to(device))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/diffusers_pipeline.py` around lines 126 - 143, The code slices text_input_ids when it exceeds self.tokenizer.model_max_length but leaves text_mask unchanged, causing a length mismatch; update the truncation block so after you set text_input_ids = text_input_ids[:, : self.tokenizer.model_max_length] you also truncate text_mask to the same width (e.g., text_mask = text_mask[:, : self.tokenizer.model_max_length].bool().to(device) or preserve its current dtype/device), ensuring text_mask and text_input_ids passed into text_encoder and downstream attention share the same sequence length (refer to variables text_input_ids, text_mask, tokenizer.model_max_length and the call text_encoder(text_input_ids.to(device))).ldm/modules/karlo/diffusers_pipeline.py-311-329 (1)
311-329:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate the
prompt/text_model_outputcontract before derivingbatch_size.Line 319 indexes
text_model_outputunconditionally whenpromptisNone, sopipeline()currently crashes with aNoneTypeerror instead of a clear validation error. The same block also acceptstext_model_outputwithouttext_attention_mask, which then fails later in_encode_prompt(), and it letspromptandtext_model_outputdisagree on batch size because this method uses the prompt-derived size while_encode_prompt()uses the embedding-derived size.Suggested fix
- if prompt is not None: + if prompt is None: + if text_model_output is None or text_attention_mask is None: + raise ValueError( + "Pass either `prompt` or both `text_model_output` and `text_attention_mask`." + ) + batch_size = text_model_output[0].shape[0] + else: if isinstance(prompt, str): batch_size = 1 elif isinstance(prompt, list): batch_size = len(prompt) else: raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}") - else: - batch_size = text_model_output[0].shape[0] + + if text_model_output is not None and text_model_output[0].shape[0] != batch_size: + raise ValueError("`prompt` and `text_model_output` must have the same batch size.")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/diffusers_pipeline.py` around lines 311 - 329, When prompt is None, the code currently indexes text_model_output[0] and may accept text_model_output without text_attention_mask, causing unclear crashes and batch-size mismatches with _encode_prompt; update pipeline() to validate the prompt/text_model_output contract: if prompt is None require text_model_output and text_attention_mask to be non-None, derive batch_size from text_model_output[0].shape[0] (not by indexing when None), and if both prompt and text_model_output are supplied ensure their batch sizes agree (raise a clear ValueError on mismatch); also when prompt is provided but text_model_output is present, verify consistency before calling _encode_prompt(); reference symbols: prompt, text_model_output, text_attention_mask, batch_size, _encode_prompt, pipeline().ldm/modules/encoders/modules.py-323-329 (1)
323-329:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
clip_stats_path=Noneonly works for 256-d embeddings.The fallback
data_mean/data_stdare sized fromtimestep_dim(default256), butforward()applies them to the actual CLIP embedding width. Any encoder that does not emit 256 features will shape-mismatch here. Using scalar0/1defaults, or a separate embedding-dim parameter, keeps the no-stats path generic.Suggested fix
if clip_stats_path is None: - clip_mean, clip_std = torch.zeros(timestep_dim), torch.ones(timestep_dim) + clip_mean, clip_std = torch.zeros(1), torch.ones(1) else: clip_mean, clip_std = torch.load(clip_stats_path, map_location="cpu")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/encoders/modules.py` around lines 323 - 329, The fallback when clip_stats_path is None assumes timestep_dim (256) which breaks when CLIP embeddings have a different width; update the fallback in the constructor so that data_mean and data_std are created to match the actual CLIP embedding width used in forward() instead of hardcoding timestep_dim: either use scalar defaults (torch.tensor(0.) / torch.tensor(1.)) registered as buffers so broadcasting works, or accept/derive an explicit embedding_dim parameter and construct clip_mean/clip_std of size embedding_dim before calling self.register_buffer("data_mean", ...) and self.register_buffer("data_std", ...); ensure the symbols clip_stats_path, timestep_dim, data_mean, data_std and forward() are consistent after the change.ldm/modules/karlo/kakao/models/decoder_model.py-101-113 (1)
101-113:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the invalid default for
timestep_respacing.
forward()defaultstimestep_respacingtoNone, butget_sample_fn()unconditionally callsstartswith()on it. That makes the default call path unusable. Either resolveNoneto the configured respacing or require the argument explicitly.Also applies to: 115-124, 154-155
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/models/decoder_model.py` around lines 101 - 113, The code calls timestep_respacing.startswith(...) in get_sample_fn but forward defaults timestep_respacing to None, causing an AttributeError; change get_sample_fn (and the similar places at the other startswith calls) to first resolve a None input to the configured respacing (e.g. timestep_respacing = timestep_respacing or self._diffusion_kwargs.get("timestep_respacing") or a sensible default) before calling startswith, then proceed to call create_gaussian_diffusion(**diffusion_kwargs) and select diffusion.ddim_sample_loop_progressive vs diffusion.p_sample_loop_progressive; alternatively, if you prefer explicitness, raise a clear ValueError when timestep_respacing is None instead of resolving it.ldm/modules/karlo/kakao/sampler.py-95-101 (1)
95-101:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake the default
progressive_modevalid.Both
__call__methods defaultprogressive_modetoNoneand then immediately rejectNone, so a plainsampler(prompt, bsz)call always raises. Default this to"final"or treatNoneas the final-only mode.Also applies to: 242-248
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/sampler.py` around lines 95 - 101, The __call__ methods currently default progressive_mode to None and immediately assert it's one of ("loop","stage","final"), causing plain calls to fail; change the default to "final" or add a guard that treats None as "final" before the assert in the __call__ implementations (refer to the __call__ at lines around the first occurrence and the second __call__ around lines 242-248) so that progressive_mode is valid when omitted; ensure you update both locations (the sampler.__call__ methods) to either use progressive_mode: str = "final" or set progressive_mode = "final" when None prior to the assert.ldm/modules/karlo/kakao/models/prior_model.py-72-80 (1)
72-80:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle the default
timestep_respacingpath.
forward()exposestimestep_respacing=None, butget_sample_fn()immediately callsstartswith()on it. A caller that relies on the default will hitAttributeErrorbefore sampling begins. Default this toself._diffusion_kwargs["timestep_respacing"]or rejectNoneexplicitly.Also applies to: 90-98, 126-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/models/prior_model.py` around lines 72 - 80, The code calls timestep_respacing.startswith(...) without guarding for None; update get_sample_fn to first set timestep_respacing = timestep_respacing or self._diffusion_kwargs.get("timestep_respacing") (or explicitly raise a clear ValueError if None is unacceptable), then proceed to compute use_ddim and build diffusion_kwargs/create_gaussian_diffusion and choose diffusion.ddim_sample_loop vs diffusion.p_sample_loop; make the same fix in the other places that call .startswith (e.g., the forward caller that exposes timestep_respacing=None and any other methods around the create_gaussian_diffusion / diffusion sample selection blocks) so None is handled consistently.ldm/modules/karlo/kakao/template.py-87-92 (1)
87-92:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve the config YAML paths before loading them.
These loaders hand raw relative strings to
OmegaConf.load(), and the defaults here are also missing theconfigs/karlo/prefix used by the added files. That means the defaulted APIs are broken, and even explicit Karlo paths only work when the caller's current working directory happens to be the repo root.Also applies to: 95-95, 114-114, 117-117, 130-130, 133-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/template.py` around lines 87 - 92, The loader passes raw relative paths to OmegaConf.load which breaks unless CWD is repo root; in load_prior resolve prior_config and clip_stat_path to absolute paths before calling OmegaConf.load by joining them (when not absolute) to the package's configs/karlo directory (use Path(__file__).resolve().parent / "configs" / "karlo" or equivalent) and convert to str; apply the same path-resolution logic to the other Karlo config/stat parameters referenced elsewhere in this module so any relative default like "configs/prior_1B_vit_l.yaml" is converted to an absolute path prior to OmegaConf.load or file reads.ldm/modules/karlo/kakao/models/clip.py-37-39 (1)
37-39:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid deleting third-party
CLIP.forwardglobally (process-wide mutation)
ldm/modules/karlo/kakao/models/clip.pyremovesclip.model.CLIP.forwardat import time (delattr(...)), mutating the external CLIP class for the whole process.CustomizedCLIP.forward(image, text)then callssuper().forward(image, text)after that deletion and doesn’t return anything, so any eager use ofclip(image, text)will fail/returnNone.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ldm/modules/karlo/kakao/models/clip.py` around lines 37 - 39, The code currently deletes the external CLIP.forward via delattr(clip.model.CLIP, "forward"), causing a process-wide mutation and breaking super() calls; remove that delattr call and instead implement/use a proper subclass that overrides forward: keep the LayerNorm assignment (clip.model.LayerNorm = LayerNorm), ensure CustomizedCLIP inherits from clip.model.CLIP, implement CustomizedCLIP.forward(self, image, text) to call result = super().forward(image, text) and return that result (do not drop the return), and instantiate or register that subclass locally where the model is created rather than mutating clip.model.CLIP globally.scripts/txt2img.py-180-185 (1)
180-185:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDisable CPU autocast by default when
--device=cpu
--devicenow defaults to"cpu"while--precisiondefaults to"autocast", so the no-flag path enablestorch.autocast("cpu")viaprecision_scope(opt.device)even when--bf16is not set (CPU autocast defaults to bf16).Suggested fix
- precision_scope = autocast if opt.precision=="autocast" or opt.bf16 else nullcontext + use_cuda_autocast = opt.precision == "autocast" and opt.device == "cuda" + use_cpu_bf16 = opt.device == "cpu" and opt.bf16 + precision_scope = autocast if use_cuda_autocast or use_cpu_bf16 else nullcontext🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/txt2img.py` around lines 180 - 185, The CLI defaults result in torch.autocast("cpu") being enabled unintentionally; after parsing arguments, check opt.device and opt.precision and disable CPU autocast when device=="cpu" by changing the precision to a non-autocast mode (e.g., set opt.precision="fp32") before calling precision_scope(opt.device); locate the parser.add_argument block for "--device" and the code that uses precision_scope(opt.device) and add a small guard like: if opt.device == "cpu" and opt.precision == "autocast": opt.precision = "fp32" (optionally log a warning) so CPU runs do not default into bf16/autocast.
🟡 Minor comments (11)
configs/stable-diffusion/intel/v2-inference-fp32.yaml-35-35 (1)
35-35:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAddress or remove the flash-attention TODO.
The comment
# need to fix for flash-attnindicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the currentnum_head_channels: 64value is correct.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/stable-diffusion/intel/v2-inference-fp32.yaml` at line 35, The inline TODO on num_head_channels is incomplete; either remove the comment or replace it with a clear action. Update the YAML entry for num_head_channels (currently set to 64) by either deleting the trailing comment "# need to fix for flash-attn" if 64 is confirmed correct, or replace it with a short note documenting the required value and reasoning for flash-attn compatibility (e.g., required head size or reference to matching num_heads), so future maintainers know whether/what needs changing.configs/karlo/improved_sr_64_256_1.4B.yaml-26-26 (1)
26-26:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClarify the
# fixcomment.The inline comment
# fixsuggests uncertainty about thetimestep_respacingvalue. If'7'is the correct value for 7-step sampling (as used in ImprovedSupRes64to256ProgressiveModel per the stack context), remove the comment. Otherwise, document what needs to be fixed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/karlo/improved_sr_64_256_1.4B.yaml` at line 26, The inline comment "# fix" next to timestep_respacing is ambiguous; update the config to either remove the comment if the value '7' is correct for 7-step sampling used by ImprovedSupRes64to256ProgressiveModel, or replace the comment with a clear note describing the intended spacing (e.g., "7-step sampling" or "matches model's progressive schedule") so readers know whether timestep_respacing = '7' is intentional or requires adjustment.configs/stable-diffusion/intel/v2-inference-bf16.yaml-36-36 (1)
36-36:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAddress or remove the flash-attention TODO.
The comment
# need to fix for flash-attnindicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the currentnum_head_channels: 64value is correct.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/stable-diffusion/intel/v2-inference-bf16.yaml` at line 36, The inline TODO on num_head_channels is ambiguous; either remove the comment or replace it with a precise note about flash-attn requirements. Update the line referencing num_head_channels to either drop the "# need to fix for flash-attn" comment if 64 is confirmed correct, or replace it with a short, specific instruction explaining what must change for flash‑attention (e.g., required per-head channel size, alignment, or compatibility check) and reference the setting name num_head_channels so future maintainers know what to verify when enabling flash-attn.configs/stable-diffusion/intel/v2-inference-v-bf16.yaml-37-37 (1)
37-37:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAddress or remove the flash-attention TODO.
The comment
# need to fix for flash-attnindicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the currentnum_head_channels: 64value is correct.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/stable-diffusion/intel/v2-inference-v-bf16.yaml` at line 37, The inline TODO comment about flash-attention on the num_head_channels setting is incomplete; update or remove it: if 64 is verified to be correct for flash-attn, remove the comment and add a short clarifying comment like "compatible with flash-attn"; otherwise, document the exact change required (e.g., target head channel size or calculation) and adjust num_head_channels accordingly. Locate the num_head_channels entry in the config (the line containing "num_head_channels: 64") and either delete the "# need to fix for flash-attn" note or replace it with a clear, actionable comment and/or the corrected numeric value so the config and comment are consistent.configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml-45-45 (1)
45-45:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAddress or remove the flash-attention TODO.
The comment
# need to fix for flash-attnindicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the currentnum_head_channels: 64value is correct.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/stable-diffusion/v2-1-stable-unclip-h-inference.yaml` at line 45, The inline TODO on the num_head_channels setting is ambiguous; either remove the comment or explain what to change for flash-attn compatibility. Update the YAML entry for num_head_channels: 64 by either deleting the comment "# need to fix for flash-attn" if 64 is confirmed correct, or replacing it with a short note that specifies the required value or condition for flash-attn (e.g., expected head dim, multiple-of constraint, or a link/reference to the calculation), referencing the num_head_channels key so reviewers can verify it against the flash-attn requirements.configs/stable-diffusion/intel/v2-inference-v-fp32.yaml-36-36 (1)
36-36:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAddress or remove the flash-attention TODO.
The comment
# need to fix for flash-attnindicates incomplete work. Either document what needs fixing for flash-attention compatibility or remove the comment if the currentnum_head_channels: 64value is correct.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/stable-diffusion/intel/v2-inference-v-fp32.yaml` at line 36, The inline TODO on num_head_channels should be resolved: either remove the comment or replace it with a concrete note describing required changes for flash-attn compatibility. Locate the num_head_channels: 64 entry and if 64 is correct, delete the "# need to fix for flash-attn" comment; if not, update the comment to state the exact required value or transformation (e.g., match head_dim or num_heads expectations) and reference flash-attn constraints so future readers know the exact fix needed.doc/UNCLIP.MD-51-51 (1)
51-51:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix grammatical error.
The word "than" should be "that" in this context.
📝 Proposed fix
-to launch a streamlit script than can be used to make image variations with both models (CLIP-L and OpenCLIP-H). +to launch a streamlit script that can be used to make image variations with both models (CLIP-L and OpenCLIP-H).As per static analysis, this matches the LanguageTool grammar check finding.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@doc/UNCLIP.MD` at line 51, Replace the incorrect word "than" with "that" in the sentence containing "to launch a streamlit script than can be used to make image variations with both models (CLIP-L and OpenCLIP-H)" in UNCLIP.MD so the phrase reads "to launch a streamlit script that can be used..." to fix the grammatical error.requirements.txt-16-16 (1)
16-16:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate Kornia dependency pin to the latest stable release.
kornia==0.6is outdated (latest stable is kornia 0.8.2, released 2025-11-08). Upgrade the pin and verify your code/tests against any API changes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` at line 16, Update the Kornia dependency pin in requirements.txt from "kornia==0.6" to the latest stable "kornia==0.8.2" (replace the existing "kornia==0.6" entry), then run the test suite and any image-processing integration tests to catch API changes; if failures occur, adjust usages of Kornia APIs referenced in your codebase (search for imports/usages of "kornia" and functions/classes that may have changed) to the new 0.8.2 API.scripts/streamlit/stableunclip.py-342-360 (1)
342-360:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject zero-sum input weights before normalizing.
The sliders allow negative weights, but Line 360 divides by
sum(weights)without checking for0. A simple+1 / -1mix yieldsinf/NaNconditioning and unstable sampling.🧮 Suggested fix
- adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights) + total_weight = sum(weights) + if total_weight == 0: + st.error("Input weights must not sum to 0.") + st.stop() + adm_cond = torch.stack(adm_inputs).sum(0) / total_weight🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/streamlit/stableunclip.py` around lines 342 - 360, The code computes adm_cond = torch.stack(adm_inputs).sum(0) / sum(weights) but doesn’t guard against sum(weights) == 0; update the logic after building weights/adm_inputs (around the adm_inputs/weights aggregation and make_conditionings_from_input usage) to detect total_weight = sum(weights) == 0 and reject/handle it: e.g., show an error/message in the Streamlit UI and prevent normalization/sampling until weights are non‑zero, or fallback to a safe default normalization (like using 1.0) so you never divide by zero; ensure the check references the weights list and the adm_inputs aggregation (adm_inputs, weights, adm_cond) so the UI prevents or corrects zero-sum inputs before calling torch.stack(...)/division.scripts/streamlit/stableunclip.py-245-249 (1)
245-249:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInitialize
msgas a string before appending metadata.
msgstarts asNone, but Line 249 appends to it whenevermodel_ema.num_updatesexists. Checkpoints withoutglobal_stepwill crash here even though that metadata is optional.🧾 Suggested fix
- msg = None + msg = "" if "global_step" in pl_sd: msg = f"This is global step {pl_sd['global_step']}. " if "model_ema.num_updates" in pl_sd["state_dict"]: msg += f"And we got {pl_sd['state_dict']['model_ema.num_updates']} EMA updates." @@ - return model, msg + return model, (msg or None)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/streamlit/stableunclip.py` around lines 245 - 249, The bug is that msg is initialized to None and later appended to when checking pl_sd["state_dict"]["model_ema.num_updates"], which crashes if "global_step" is absent; change initialization so msg is a string (e.g., msg = "" or set msg = "" when creating it) or guard the append by converting None to "" before using +=; update the variables in this block (msg, pl_sd, pl_sd["state_dict"], "model_ema.num_updates") so msg is always a string before any concatenation.scripts/txt2img.py-218-219 (1)
218-219:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFail fast when
--device cudais unavailable.
torch.device("cuda")doesn’t check runtime CUDA availability; it just creates a CUDA device handle, so failures get deferred until model/sampling. Add an earlytorch.cuda.is_available()guard when--device cudais requested.Suggested fix
config = OmegaConf.load(f"{opt.config}") device = torch.device("cuda") if opt.device == "cuda" else torch.device("cpu") + if device.type == "cuda" and not torch.cuda.is_available(): + raise ValueError("--device cuda was requested, but CUDA is not available") model = load_model_from_config(config, f"{opt.ckpt}", device)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/txt2img.py` around lines 218 - 219, When opt.device == "cuda" the code currently constructs torch.device("cuda") without verifying CUDA is actually available; add an explicit check using torch.cuda.is_available() before creating device and loading the model (affecting the device variable and the load_model_from_config call) and fail fast with a clear error/log and exit if CUDA is not available so load_model_from_config(...) is never called on a non-functional CUDA device.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: a40fc073-b5f1-407c-a83f-fa7d024a3bfa
⛔ Files ignored due to path filters (8)
assets/stable-samples/stable-unclip/houses_out.jpegis excluded by!**/*.jpegassets/stable-samples/stable-unclip/oldcar000.jpegis excluded by!**/*.jpegassets/stable-samples/stable-unclip/oldcar500.jpegis excluded by!**/*.jpegassets/stable-samples/stable-unclip/oldcar800.jpegis excluded by!**/*.jpegassets/stable-samples/stable-unclip/panda.jpgis excluded by!**/*.jpgassets/stable-samples/stable-unclip/plates_out.jpegis excluded by!**/*.jpegassets/stable-samples/stable-unclip/unclip-variations.pngis excluded by!**/*.pngassets/stable-samples/stable-unclip/unclip-variations_noise.pngis excluded by!**/*.png
📒 Files selected for processing (45)
.gitignoreREADME.mdcheckpoints/checkpoints.txtconfigs/karlo/decoder_900M_vit_l.yamlconfigs/karlo/improved_sr_64_256_1.4B.yamlconfigs/karlo/prior_1B_vit_l.yamlconfigs/stable-diffusion/intel/v2-inference-bf16.yamlconfigs/stable-diffusion/intel/v2-inference-fp32.yamlconfigs/stable-diffusion/intel/v2-inference-v-bf16.yamlconfigs/stable-diffusion/intel/v2-inference-v-fp32.yamlconfigs/stable-diffusion/v2-1-stable-unclip-h-inference.yamlconfigs/stable-diffusion/v2-1-stable-unclip-l-inference.yamldoc/UNCLIP.MDldm/models/diffusion/ddim.pyldm/models/diffusion/ddpm.pyldm/models/diffusion/dpm_solver/dpm_solver.pyldm/models/diffusion/dpm_solver/sampler.pyldm/models/diffusion/plms.pyldm/modules/attention.pyldm/modules/diffusionmodules/openaimodel.pyldm/modules/diffusionmodules/util.pyldm/modules/encoders/modules.pyldm/modules/karlo/__init__.pyldm/modules/karlo/diffusers_pipeline.pyldm/modules/karlo/kakao/__init__.pyldm/modules/karlo/kakao/models/__init__.pyldm/modules/karlo/kakao/models/clip.pyldm/modules/karlo/kakao/models/decoder_model.pyldm/modules/karlo/kakao/models/prior_model.pyldm/modules/karlo/kakao/models/sr_256_1k.pyldm/modules/karlo/kakao/models/sr_64_256.pyldm/modules/karlo/kakao/modules/__init__.pyldm/modules/karlo/kakao/modules/diffusion/gaussian_diffusion.pyldm/modules/karlo/kakao/modules/diffusion/respace.pyldm/modules/karlo/kakao/modules/nn.pyldm/modules/karlo/kakao/modules/resample.pyldm/modules/karlo/kakao/modules/unet.pyldm/modules/karlo/kakao/modules/xf.pyldm/modules/karlo/kakao/sampler.pyldm/modules/karlo/kakao/template.pyldm/util.pymodelcard.mdrequirements.txtscripts/streamlit/stableunclip.pyscripts/txt2img.py
| image_size: 64 | ||
| num_channels: 320 | ||
| num_res_blocks: 3 | ||
| channel_mult: '' |
There was a problem hiding this comment.
channel_mult should be a list, not an empty string.
The channel_mult parameter is set to an empty string, but UNet architectures expect a list of integer multipliers (e.g., [1, 2, 4, 4] as seen in the Stable Diffusion configs). This will cause a runtime error when the model attempts to iterate over channel multipliers during initialization.
🔧 Proposed fix
- channel_mult: ''
+ channel_mult: [1, 2, 4, 4]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| channel_mult: '' | |
| channel_mult: [1, 2, 4, 4] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@configs/karlo/decoder_900M_vit_l.yaml` at line 8, The channel_mult parameter
is currently an empty string which will break UNet initialization; update
channel_mult to be a list of integer multipliers (for example, 1,2,4,4) instead
of an empty string so the model can iterate over channel multipliers during
setup; locate the channel_mult entry in the YAML and replace the empty-string
value with a list of integers appropriate for the UNet configuration.
| def _init_embedder(self, config, freeze=True): | ||
| embedder = instantiate_from_config(config) | ||
| if freeze: | ||
| self.embedder = embedder.eval() | ||
| self.embedder.train = disabled_train | ||
| for param in self.embedder.parameters(): | ||
| param.requires_grad = False |
There was a problem hiding this comment.
freeze_embedder=False is broken.
self.embedder is only assigned inside the frozen branch, so disabling freeze_embedder makes the first self.embedder(img) call fail with AttributeError. If this flag is intended to support fine-tuning, the embedder also needs to stay registered in the non-frozen path.
Suggested fix
def _init_embedder(self, config, freeze=True):
- embedder = instantiate_from_config(config)
+ self.embedder = instantiate_from_config(config)
if freeze:
- self.embedder = embedder.eval()
+ self.embedder = self.embedder.eval()
self.embedder.train = disabled_train
for param in self.embedder.parameters():
param.requires_grad = False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ldm/models/diffusion/ddpm.py` around lines 1813 - 1819, The _init_embedder
function only assigns self.embedder inside the freeze branch, so when
freeze=False self.embedder is never registered and calls like self.embedder(img)
raise AttributeError; fix by assigning self.embedder = embedder before the if
freeze: block (so the module is always registered), and inside the if freeze:
branch call self.embedder = embedder.eval(), set self.embedder.train =
disabled_train and set param.requires_grad = False; when freeze is False, do not
override self.embedder.train or disable gradients so fine-tuning works (leave
parameters' requires_grad untouched). Ensure you reference _init_embedder,
instantiate_from_config, disabled_train, and self.embedder when making the
change.
| img = batch[self.embed_key][:bs] | ||
| img = rearrange(img, 'b h w c -> b c h w') | ||
| c_adm = self.embedder(img) |
There was a problem hiding this comment.
Move the embedding image onto self.device before encoding.
LatentDiffusion.get_input() moves the VAE/conditioning tensors to the module device, but this branch leaves img on the dataloader device. On GPU/XPU runs, self.embedder(img) will fail with a device mismatch.
Suggested fix
img = batch[self.embed_key][:bs]
img = rearrange(img, 'b h w c -> b c h w')
+ img = img.to(self.device, memory_format=torch.contiguous_format).float()
c_adm = self.embedder(img)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| img = batch[self.embed_key][:bs] | |
| img = rearrange(img, 'b h w c -> b c h w') | |
| c_adm = self.embedder(img) | |
| img = batch[self.embed_key][:bs] | |
| img = rearrange(img, 'b h w c -> b c h w') | |
| img = img.to(self.device, memory_format=torch.contiguous_format).float() | |
| c_adm = self.embedder(img) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ldm/models/diffusion/ddpm.py` around lines 1835 - 1837, The img tensor taken
from batch[self.embed_key] is not moved to the model device before calling
self.embedder, causing device-mismatch errors; before calling self.embedder(img)
(in the block using img and c_adm), move img to self.device (e.g., img =
img.to(self.device)) after rearrange so the embedding runs on the same device as
the module (referencing img, self.embed_key, self.embedder, and
LatentDiffusion.get_input for consistency).
| from ldm.modules.karlo.kakao.models.sr_64_256 import SupRes64to256Progressive | ||
|
|
||
|
|
||
| class SupRes256to1kProgressive(SupRes64to256Progressive): | ||
| pass # no difference currently |
There was a problem hiding this comment.
Fix the broken base-class import.
sr_64_256.py does not define SupRes64to256Progressive, so importing this module will fail immediately. The subclass needs to inherit from the class that actually exists.
Suggested fix
-from ldm.modules.karlo.kakao.models.sr_64_256 import SupRes64to256Progressive
+from ldm.modules.karlo.kakao.models.sr_64_256 import (
+ ImprovedSupRes64to256ProgressiveModel,
+)
-class SupRes256to1kProgressive(SupRes64to256Progressive):
+class SupRes256to1kProgressive(ImprovedSupRes64to256ProgressiveModel):
pass # no difference currently📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from ldm.modules.karlo.kakao.models.sr_64_256 import SupRes64to256Progressive | |
| class SupRes256to1kProgressive(SupRes64to256Progressive): | |
| pass # no difference currently | |
| from ldm.modules.karlo.kakao.models.sr_64_256 import ( | |
| ImprovedSupRes64to256ProgressiveModel, | |
| ) | |
| class SupRes256to1kProgressive(ImprovedSupRes64to256ProgressiveModel): | |
| pass # no difference currently |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ldm/modules/karlo/kakao/models/sr_256_1k.py` around lines 6 - 10, The file
imports and subclasses SupRes64to256Progressive which does not exist in
sr_64_256.py; update the import and inheritance so SupRes256to1kProgressive
extends the actual class defined in sr_64_256.py (replace
SupRes64to256Progressive with the real class name exported from sr_64_256.py) —
modify the import statement and the class declaration for
SupRes256to1kProgressive to reference that existing symbol.
…nchises
Researched what 433 (77M followers) and B/R Football do and codified it:
- franchises.ts: recurring named series with fixed visual cue + predictable
cadence ('build a show, not a feed') — the top accounts' nlile#1 structural lever.
Match-day-aware (react fast like 433: Predicted XI -> Matchday Verdict ->
Stat Bomb) plus a weekly appointment rhythm for non-match days.
- Each episode carries a recognizable caption header, a signature interactive
CTA (save/send/comment mechanic), and a recurring on-image cue.
- captions.ts: '433-style locker-room' voice for the LLM path; franchise header
+ series CTA in both LLM and template paths.
- planner/visuals/report wired to franchises; new 'wm-hub shows' command.
- STRATEGY.md documents the franchise lineup and the rationale.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Dependencies