Bot-69912020 t1_j24hkd7 wrote on December 29, 2022 at 4:30 PM

Reply to [D] SOTA Multiclass Model Calibration by arcxtriy

It might be more transparent to split up your approach in two steps. First, we try to get a valid probability vector for each prediction (i.e. the vector sums up to 1). Second, we try to recalibrate the probabilities in each vector to improve the correctness of the predicted probabilities.

For the first point, it is important to know the range of your invalid outputs: If they are negative as well as positive, you might want to transform your whole output via softmax function. If you only have positive values v1, ..., vm, but the sum of the vector is not, then it is sufficient to compute vi / (v1+...+vm) to get valid probabilities.

Now, we can try to improve the predicted probabilities via post-hoc recalibration. For this, there have been several methods proposed. But, the simplest baseline, which works surprisingly well for most cases is temperature scaling. Start with that and try to make it work - it usually always gives at least minor improvements in ECE and NLL (don't use ECE alone, it is unreliable; see Fig.2). Once TS works, you can still try out ensemble temperature scaling, parametrized temperature scaling, intra-order preserving scaling, splines, ...

Some of these methods (including temperature scaling) use logits as inputs and their output are logits again. So, to receive logits, you apply the multivariate logit function if you already have probabilities, or simply use your untransformed outputs as logits if you would have used softmax in the first step.

Bot-69912020 t1_iyf8gr7 wrote on November 30, 2022 at 11:13 PM

Reply to comment by ThisIsMyStonerAcount in [D] I'm at NeurIPS, AMA by ThisIsMyStonerAcount

When my prof tried to get the conference batch, he accidentally queued at workboat and only realized it when they rejected him lol

Would have been really funny if they went through with it and he ended up walking through a boating conference, having no clue what is going on.

Bot-69912020 t1_ixc9soc wrote on November 22, 2022 at 10:48 AM

Reply to [D] AISTATS 2023 reviews are out by von_oldmann

9 and 3, both confidence of 4 lol fortunately, the other reviewers are also more positive

Bot-69912020 t1_ivxbxml wrote on November 11, 2022 at 8:30 AM

Reply to comment by jrkirby in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru

I don't know about each specific implementation, but via the definition of subgradients you can get 'derivatives' of convex but non-differentiable functions (which ReLU is).

More formally: A subgradient at a point x of a convex function f is any x' such that f(y) >= f(x) + < x', y - x > for all y. The set of all possible subgradients at a point x is called the subdifferential of f at x.

For more details, see here.

Bot-69912020 t1_ive3h2r wrote on November 7, 2022 at 8:03 AM

Reply to comment by apliens in [D] Git Re-Basin Paper Accused of Misinformation by fryingnem0

Can we assume SPS has not contacted the authors before? I don't know. Maybe he did and they ignored him. Anyway, I am waiting for the authors' response before making any conclusions.

Bot-69912020 t1_ittw3bj wrote on October 26, 2022 at 7:50 AM

Reply to comment by rehrev in [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe

maybe to help your intuition, consider the following: Do more parameters really increase model complexity if they are less fitted? Check out this post https://twitter.com/tengyuma/status/1545101994150531073

Bot-69912020 t1_ittvs0g wrote on October 26, 2022 at 7:45 AM

Reply to comment by comradeswitch in [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe

thanks for your clarification - i am sure it is useful for the other readers! But based on your knowledge, you might be interested in our latest preprint, which offers a more general bias-variance decomposition https://arxiv.org/pdf/2210.12256.pdf

Bot-69912020 t1_itfqpz6 wrote on October 23, 2022 at 9:03 AM

Reply to comment by rehrev in [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe

illustration of why it happens in low-dimensions: https://twitter.com/adad8m/status/1582231644223987712

i think the main problem is that all textbooks introduce the bias-variance tradeoff as something close to a theoretical law, while in reality, it is just an empirical observation and we simply haven't bothered to further check this observation across more settings... until now