with torch.no_grad():
# generate an image with the concept from ESD model
z, latent_image_ids = latent_sample(transformer,
noise_scheduler,
1,
model_input.shape[1],
512,
512,
emb_p.to(transformer.device),
pooled_emb_p.to(transformer.device),
text_ids_p.to(transformer.device),
start_guidance,
int(ddim_steps),
vae_scale_factor)
# e_0 & e_p
e_0 = predict_noise(transformer, z, emb_0, pooled_emb_0, text_ids_0, latent_image_ids, guidance=start_guidance, timesteps=t_enc_ddpm.to(transformer.device), CPU_only=True)
e_p = predict_noise(transformer, z, emb_p, pooled_emb_p, text_ids_p, latent_image_ids, guidance=start_guidance, timesteps=t_enc_ddpm.to(transformer.device), CPU_only=True)
# get conditional score from ESD model
e_n = predict_noise(transformer, z, emb_p, pooled_emb_p, text_ids_p, latent_image_ids, guidance=start_guidance, timesteps=t_enc_ddpm.to(transformer.device), CPU_only=True)
Is this intentional in your implementation, or could this be a mistake? I would appreciate any clarification on the design choice.
Hi, thanks for your great work!
I have a question regarding the implementation of the ESD loss in
calculate_upper_loss(incalc_loss.py).It seems that
e_n,e_p, ande_0are all computed using the transformer with LoRA applied. However, my understanding is that LoRA should only be applied when computinge_n, and thate_pande_0should be computed using the base (non-LoRA) model.Is this intentional in your implementation, or could this be a mistake? I would appreciate any clarification on the design choice.
Thanks in advance!