Beautiful-Gur-9456
Beautiful-Gur-9456 OP t1_je3uesm wrote
Reply to comment by hebweb in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
I haven't done it yet, but I'm working on it! Their suggested sampling procedure requires multiple FID calculation, so I'm thinking of how to incorparate it efficiently.
Their scale is indeed large, it would cost me a few hundread bucks to train CIFAR10. My checkpoint was trained with much smaller size 😆
Beautiful-Gur-9456 OP t1_je3sung wrote
Reply to comment by Username912773 in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
I think the reason lies in the difference in the amount of computation rather than architectural difference. Diffusion models have many chances to correct their predictions, but GANs do not.
Beautiful-Gur-9456 OP t1_je3qsdu wrote
Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
Nope. I mean the LPIPS loss, which kinda acts like a discriminator in GANs. We can replace it to MSE without much degradation.
Distilling SOTA diffusion model is obviously cheating 😂, so I didn't even think of it. In my view, they are just apples and oranges. We can augment diffusion models with GANs and vice versa to get the most out of them, but what's the point? That would make things way more complex. It's clear that diffusion models cannot beat SOTA GANs for one-step generation; they've been tailored for that particular task for years. But we're just exploring possibilities, right?
Aside from the complexity, I think it's worth a shot to replace LPIPS loss and adversarially train it as a discriminator. Using pre-trained VGG is cheating anyway. That would be an interesting direction to see!
Beautiful-Gur-9456 OP t1_je3hxbn wrote
Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
The training pipeline, honestly, is significantly simpler without adversarial training, so the design space is much smaller.
It's actually reminiscent of GANs since it uses pre-trained networks as a loss function to improve the quality, though it's completely optional. Still, it's a lot easier than trying to solve any kind of minimax problem.
Beautiful-Gur-9456 OP t1_je18sc5 wrote
Reply to comment by noraizon in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
You're totally right 😅 I think the true novelty here is dropping distillation and introducing a BYoL-like simple formulation. Bootstrapping always feels like magic to me.
Beautiful-Gur-9456 OP t1_je0bdgm wrote
Reply to comment by ninjasaid13 in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
Just one UNet inference, that's all you need.
Beautiful-Gur-9456 OP t1_jdzzagc wrote
Reply to comment by CyberDainz in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
That's the generated samples recorded every 10 epochs during training, not the denoising process. It does look like deblurring though 😊
Beautiful-Gur-9456 OP t1_je5p8bu wrote
Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456
was that a thing? lmao 🤣