Bot-69912020
Bot-69912020 t1_iyf8gr7 wrote
Reply to comment by ThisIsMyStonerAcount in [D] I'm at NeurIPS, AMA by ThisIsMyStonerAcount
When my prof tried to get the conference batch, he accidentally queued at workboat and only realized it when they rejected him lol
Would have been really funny if they went through with it and he ended up walking through a boating conference, having no clue what is going on.
Bot-69912020 t1_ixc9soc wrote
Reply to [D] AISTATS 2023 reviews are out by von_oldmann
9 and 3, both confidence of 4 lol fortunately, the other reviewers are also more positive
Bot-69912020 t1_ivxbxml wrote
Reply to comment by jrkirby in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru
I don't know about each specific implementation, but via the definition of subgradients you can get 'derivatives' of convex but non-differentiable functions (which ReLU is).
More formally: A subgradient at a point x of a convex function f is any x' such that f(y) >= f(x) + < x', y - x > for all y. The set of all possible subgradients at a point x is called the subdifferential of f at x.
For more details, see here.
Bot-69912020 t1_ive3h2r wrote
Reply to comment by apliens in [D] Git Re-Basin Paper Accused of Misinformation by fryingnem0
Can we assume SPS has not contacted the authors before? I don't know. Maybe he did and they ignored him. Anyway, I am waiting for the authors' response before making any conclusions.
Bot-69912020 t1_ittw3bj wrote
Reply to comment by rehrev in [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe
maybe to help your intuition, consider the following: Do more parameters really increase model complexity if they are less fitted? Check out this post https://twitter.com/tengyuma/status/1545101994150531073
Bot-69912020 t1_ittvs0g wrote
Reply to comment by comradeswitch in [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe
thanks for your clarification - i am sure it is useful for the other readers! But based on your knowledge, you might be interested in our latest preprint, which offers a more general bias-variance decomposition https://arxiv.org/pdf/2210.12256.pdf
Bot-69912020 t1_itfqpz6 wrote
Reply to comment by rehrev in [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe
illustration of why it happens in low-dimensions: https://twitter.com/adad8m/status/1582231644223987712
i think the main problem is that all textbooks introduce the bias-variance tradeoff as something close to a theoretical law, while in reality, it is just an empirical observation and we simply haven't bothered to further check this observation across more settings... until now
Bot-69912020 t1_j24hkd7 wrote
Reply to [D] SOTA Multiclass Model Calibration by arcxtriy
It might be more transparent to split up your approach in two steps. First, we try to get a valid probability vector for each prediction (i.e. the vector sums up to 1). Second, we try to recalibrate the probabilities in each vector to improve the correctness of the predicted probabilities.
For the first point, it is important to know the range of your invalid outputs: If they are negative as well as positive, you might want to transform your whole output via softmax function. If you only have positive values v1, ..., vm, but the sum of the vector is not, then it is sufficient to compute vi / (v1+...+vm) to get valid probabilities.
Now, we can try to improve the predicted probabilities via post-hoc recalibration. For this, there have been several methods proposed. But, the simplest baseline, which works surprisingly well for most cases is temperature scaling. Start with that and try to make it work - it usually always gives at least minor improvements in ECE and NLL (don't use ECE alone, it is unreliable; see Fig.2). Once TS works, you can still try out ensemble temperature scaling, parametrized temperature scaling, intra-order preserving scaling, splines, ...
Some of these methods (including temperature scaling) use logits as inputs and their output are logits again. So, to receive logits, you apply the multivariate logit function if you already have probabilities, or simply use your untransformed outputs as logits if you would have used softmax in the first step.