Beautiful-Gur-9456

Beautiful-Gur-9456 OP t1_je3uesm wrote

I haven't done it yet, but I'm working on it! Their suggested sampling procedure requires multiple FID calculation, so I'm thinking of how to incorparate it efficiently.

Their scale is indeed large, it would cost me a few hundread bucks to train CIFAR10. My checkpoint was trained with much smaller size 😆

1

Beautiful-Gur-9456 OP t1_je3qsdu wrote

Nope. I mean the LPIPS loss, which kinda acts like a discriminator in GANs. We can replace it to MSE without much degradation.

Distilling SOTA diffusion model is obviously cheating 😂, so I didn't even think of it. In my view, they are just apples and oranges. We can augment diffusion models with GANs and vice versa to get the most out of them, but what's the point? That would make things way more complex. It's clear that diffusion models cannot beat SOTA GANs for one-step generation; they've been tailored for that particular task for years. But we're just exploring possibilities, right?

Aside from the complexity, I think it's worth a shot to replace LPIPS loss and adversarially train it as a discriminator. Using pre-trained VGG is cheating anyway. That would be an interesting direction to see!

2

Beautiful-Gur-9456 OP t1_je3hxbn wrote

The training pipeline, honestly, is significantly simpler without adversarial training, so the design space is much smaller.

It's actually reminiscent of GANs since it uses pre-trained networks as a loss function to improve the quality, though it's completely optional. Still, it's a lot easier than trying to solve any kind of minimax problem.

2