Regarding that Winoground paper: Isn't compositionally what DALLE-2, Image, Parti, etc are famous for? Like the avocado chair, or some very specific images like "a raccoon in a spacesuit playing poker". SOTA vision language model are the only models that actually show convincing compositionally, or am I wrong?
giga-chad99 t1_j45pa30 wrote
Reply to comment by actualsnek in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
Regarding that Winoground paper: Isn't compositionally what DALLE-2, Image, Parti, etc are famous for? Like the avocado chair, or some very specific images like "a raccoon in a spacesuit playing poker". SOTA vision language model are the only models that actually show convincing compositionally, or am I wrong?