Red-Portal
Red-Portal t1_jc5g7ap wrote
Reply to comment by currentscurrents in [D]: Generalisation ability of autoencoders by Blutorangensaft
Oh they have been used for compression. I also remember a paper on quantization, which made a buzz at the time.
Red-Portal t1_jc4u84k wrote
what do you mean by generalizing here? Reconstruction of OOD data? Ironically, VAEs are pissing everybody because they reconstruct OOD data too well. In fact, one of the things people are dying to get to work is anomaly detection or OOD detection, but VAEs suck at it despite all attempts. Like your dog who cannot guard your house because he really likes strangers, VAEs suck at OOD detection because they reconstruct OOD too well.
Red-Portal t1_jakr3yf wrote
Reply to [D] Are Genetic Algorithms Dead? by TobusFire
The fundamental problem with evolutionary strategies is that they are a freakin nightmare to evaluate. It's basically impossible to reason about their mathematical properties, experiments are noisy as hell, and how representative are the benchmark objective functions anyway? It's just really hard to do good science with those, which means it's hard to make concrete improvement. Sure, once upon a time they were the only choice for noisy, gradient free global optimization problems. But now we have Bayesian optimization.
Red-Portal t1_j9fqeq2 wrote
Reply to comment by compsci_man in [R] difference between UAI and AISTATS ? by ArmandDerech
Depends on the area of focus. If you're a Bayesian machine learning, statistical learning, optimization person, AISTATS is the way to go. It's not just about prestige, it's just a better experience. The review is less noisy, the venue itself is more focused. It just feels like home. If you're more of an AI person than ML, than AAAI is probably more suited.
Red-Portal t1_j994qi0 wrote
Reply to [R] difference between UAI and AISTATS ? by ArmandDerech
AISTATS tend to be more popular these days, probably due to the conference timing. If you don't want to submit to AAAI, AISTATS is the other option. Also, the review process is much less noisy due to the better focus, and you get 5 reviews in general. In terms of content, they have slightly different flavors. Traditionally, people doing Bayesian nonparametrics have favored UAI, and it still somewhat seems to be the case.
Red-Portal t1_j8p86z3 wrote
Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
Do learned optimizer people seriously believe this is the direction we should be going?
Red-Portal t1_j8ke7vj wrote
It's literally called importance sampling in the SGD literature. You normally have to downweigh the "important samples" to counter the fact that you're sampling them more often. Whether this practice actually accelerates convergence has been an important question in SGD until very recently. Check this paper.
Red-Portal t1_j7ixd0h wrote
Reply to Does the high dimensionality of AI systems that model the real world tell us something about the abstract space of ideas? [D] by Frumpagumpus
High dimensionality does not necessarily mean more complex. In fact, it has been known for quite a while that going to higher dimensions makes various problems easier; non-linearly separable datasets suddenly become separable in higher dimensions for example. Turning this to 11, you basically get kernel machines. Kernels embed the data into potentially infinite dimensional spaces, and that has been very successful before deep learning took over.
Red-Portal t1_j6gn8u1 wrote
Reply to comment by randomusername11010 in [D] AI Theory - Signal Processing? by a_khalid1999
Correlations are the same as convolutions with just the kernel flipped.
Red-Portal t1_j6efht6 wrote
Reply to comment by a_khalid1999 in [D] AI Theory - Signal Processing? by a_khalid1999
That's a more recent trend. Until the late 2000s, computer vision was basically combining machine learning techniques with image processing: Design filters to extract features, and slap them into a classifier. Naturally, lots of Fourier, wavelets, and other weird bases. Very different times.
Red-Portal t1_j6edkjk wrote
Reply to [D] AI Theory - Signal Processing? by a_khalid1999
One of the bull's eye contributions of signal processing to deep learning was this paper. From a signal processing perspective, naive pooling is obviously problematic because you're decimating without limiting the signal bandwidth. That paper showed that in 2019. Shows how much computer vision has changed from an EE-dominant field to a CS field, where signal processing is not common knowledge.
Red-Portal t1_j673lux wrote
Reply to [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
> If all your layers are on different machines connected by a high-latency internet connection, this will take a long time.
This is called model parallelism and this is exactly why you don't want to do it.... unless you're forced to do so. That is, at the scale of current large language monstrosities, the model might not fit on a single node. But other than that, model parallelism is well known to be bad, so people avoid it. Nonetheless, this is a known issue and lots of work has been done in improving data parallelism with asynchronous updates like HOGWILD! and horovod, because we know this scales better.
Red-Portal t1_j5wgewn wrote
Reply to comment by BigDreamx in [D] Publication Resume by BigDreamx
"Yes, but don't make a fuss about it" is pretty much the guideline.
Red-Portal t1_j5w8thc wrote
Reply to comment by BigDreamx in [D] Publication Resume by BigDreamx
> Authors are allowed to post versions of their work on preprint servers such as arXiv. They are also allowed to give talks to restricted audiences on the work(s) submitted to ICML during the review. If you have posted or plan to post a non-anonymized version of your paper online before the ICML decisions are made, the submitted version must not refer to the non-anonymized version.
> ICML strongly discourages advertising the preprint on social media or in the press while under submission to ICML. Under no circumstances should your work be explicitly identified as ICML submission at any time during the review period, i.e., from the time you submit the paper to the communication of the accept/reject decisions.
Mate, it's stated on the call for papers
Red-Portal t1_j5w7qtt wrote
Reply to [D] Publication Resume by BigDreamx
Yes, but I believe you musn't state that it's under consideration for ICML
Red-Portal t1_j43wds7 wrote
Reply to [D] Has ML become synonymous with AI? by Valachio
You'll have a hard time finding non-ML approaches to AI, but there are still plenty of non-AI applications of ML. For example, classical topics like kernel methods, learning theory, optimization, all ML topics that are not-so AI flavored.
Red-Portal t1_j2xfe1q wrote
Reply to [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
I think this is quite an important and fundamental question. Of course the answer will depend on the task. But in theory, what deep learning is doing is maximum likelihood. That is, minimize the average error. Doing "average" on the "whole task" is superhuman most of the time.
Red-Portal t1_j26qhza wrote
I don't see why one would have to go as far as a PID controller. The relationship between linear dynamical systems and momentum-based SGD algorithms is pretty straightforward. In fact, Lyapunov function-based analysis of SGD algorithms is pretty common.
Red-Portal t1_j1vu94s wrote
Reply to [D] Has any research been done to counteract the fact that each training datapoint "pulls the model in a different direction", partly undoing learning until shared features emerge? by derpderp3200
I think what you're describing is similar to curriculum learning and importance sampling SGD. The former claims that there is a better order of feeding data during SGD that results in better training. But I'm not sure how scientifically grounded that line of research has become. It used to be closer to art. The latter is simple. Since some samples are more "destructive" (higher variance), sample them less often while numerically compensating for that.
Red-Portal t1_j1t797h wrote
Reply to [D] ANN for sine wave prediction by T4KKKK
Vanilla ANNs, or MLPs more specifically, are well known to be shit at extrapolating, which is what you're trying to do. There has been some works using periodic activation functions that claim to be better. Try to look for those.
Red-Portal t1_j15c4yo wrote
Reply to comment by Deep-Station-1746 in Reduce paramter count in an NN without sacrificing performance [P] by ackbladder_
Not necessarily. If the neural networks had dense activations, what you said would have been true. But in reality, I think the answer cannot be a definite no.
Red-Portal t1_j02wvcs wrote
Reply to [P] Are probabilities from multi-label image classification networks calibrated? by alkaway
With deep neural networks, I would say conformal predictions are the best way to get uncertainty estimates.
Red-Portal t1_iz67ufu wrote
Reply to comment by chaosmosis in [R] The Forward-Forward Algorithm: Some Preliminary Investigations [Geoffrey Hinton] by shitboots
Yeah there is a whole "zoo" of those things haha.
Red-Portal t1_iz3u8k2 wrote
Reply to comment by kebabmybob in [R] The Forward-Forward Algorithm: Some Preliminary Investigations [Geoffrey Hinton] by shitboots
Oh I'm not saying you should just remove the footnotes. I'm saying it's better to blend them into the main text so I don't have to jump back and forth...
Red-Portal t1_jc5gmb0 wrote
Reply to comment by currentscurrents in [D]: Generalisation ability of autoencoders by Blutorangensaft
Can't remember those on compression, but for quantization I was talking about this paper