Beautiful-Gur-9456 OP t1_je5p8bu wrote on March 29, 2023 at 4:28 PM

Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

was that a thing? lmao 🤣

Beautiful-Gur-9456 OP t1_je3uesm wrote on March 29, 2023 at 5:33 AM

Reply to comment by hebweb in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

I haven't done it yet, but I'm working on it! Their suggested sampling procedure requires multiple FID calculation, so I'm thinking of how to incorparate it efficiently.

Their scale is indeed large, it would cost me a few hundread bucks to train CIFAR10. My checkpoint was trained with much smaller size 😆

Beautiful-Gur-9456 OP t1_je3sung wrote on March 29, 2023 at 5:15 AM

Reply to comment by Username912773 in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

I think the reason lies in the difference in the amount of computation rather than architectural difference. Diffusion models have many chances to correct their predictions, but GANs do not.

Beautiful-Gur-9456 OP t1_je3qsdu wrote on March 29, 2023 at 4:52 AM

Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

Nope. I mean the LPIPS loss, which kinda acts like a discriminator in GANs. We can replace it to MSE without much degradation.

Distilling SOTA diffusion model is obviously cheating 😂, so I didn't even think of it. In my view, they are just apples and oranges. We can augment diffusion models with GANs and vice versa to get the most out of them, but what's the point? That would make things way more complex. It's clear that diffusion models cannot beat SOTA GANs for one-step generation; they've been tailored for that particular task for years. But we're just exploring possibilities, right?

Aside from the complexity, I think it's worth a shot to replace LPIPS loss and adversarially train it as a discriminator. Using pre-trained VGG is cheating anyway. That would be an interesting direction to see!

Beautiful-Gur-9456 OP t1_je3hxbn wrote on March 29, 2023 at 3:27 AM

Reply to comment by geekfolk in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

The training pipeline, honestly, is significantly simpler without adversarial training, so the design space is much smaller.

It's actually reminiscent of GANs since it uses pre-trained networks as a loss function to improve the quality, though it's completely optional. Still, it's a lot easier than trying to solve any kind of minimax problem.

Beautiful-Gur-9456 OP t1_je18sc5 wrote on March 28, 2023 at 6:02 PM

Reply to comment by noraizon in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

You're totally right 😅 I think the true novelty here is dropping distillation and introducing a BYoL-like simple formulation. Bootstrapping always feels like magic to me.

Beautiful-Gur-9456 OP t1_je0bdgm wrote on March 28, 2023 at 2:29 PM

Reply to comment by ninjasaid13 in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

Just one UNet inference, that's all you need.

Beautiful-Gur-9456 OP t1_jdzzagc wrote on March 28, 2023 at 1:00 PM

Reply to comment by CyberDainz in [P] Consistency: Diffusion in a Single Forward Pass 🚀 by Beautiful-Gur-9456

That's the generated samples recorded every 10 epochs during training, not the denoising process. It does look like deblurring though 😊