Unlikely-Video-663
Unlikely-Video-663 t1_izktnhx wrote
You might be able to recast the problem to assume the labels are acutally drawn from some distribution and put some simple liklihood function over it -- then learn the parameters of that distribution. This not theoretically sound, you wont capture any epistemic uncertainty, but most of the aleotoric, so dependend on your usecase, it might work.
In practice, use for example a Gaussian likelihood, and learn with a GaussianNLL Loss also the variance. As long as your samples stay within the same distribution, yadaya, this can work okish ..
Otherwise, there are plenty of recalibration techniques to get better results
Unlikely-Video-663 t1_j284flc wrote
Reply to [D] In vision transformers, why do tokens correspond to spatial locations and not channels? by stecas
In CNNs you usually already have long range dependencies channel wise - and imho one of the advantages of vit is allowing long range spatial information flow as well.
So channel-wise tokenization would not improve upon CNNs.. maybe?