I was looking at the modifications made to Janner's UNet to implement FiLM conditioning. One thing I noticed was that the self-attention layers were removed. Compare for instance these:
- https://github.com/jannerm/diffuser/blob/7ea422860cc0106e5ca5949d980f04b799d5462c/diffuser/models/temporal.py#L85
|
Downsample1d(dim_out) if not is_last else nn.Identity() |
Apart from the missing self-attention and the introduction of FiLM conditioning, everything else is identical. Was there any reason for this design choice? Does the self-attention mess up the FiLM conditioning in the conv-1d layers?
I was looking at the modifications made to Janner's UNet to implement FiLM conditioning. One thing I noticed was that the self-attention layers were removed. Compare for instance these:
diffusion_policy/diffusion_policy/model/diffusion/conditional_unet1d.py
Line 140 in 5ba07ac
Apart from the missing self-attention and the introduction of FiLM conditioning, everything else is identical. Was there any reason for this design choice? Does the self-attention mess up the FiLM conditioning in the conv-1d layers?