Red-Portal

Red-Portal t1_jc4u84k wrote

what do you mean by generalizing here? Reconstruction of OOD data? Ironically, VAEs are pissing everybody because they reconstruct OOD data too well. In fact, one of the things people are dying to get to work is anomaly detection or OOD detection, but VAEs suck at it despite all attempts. Like your dog who cannot guard your house because he really likes strangers, VAEs suck at OOD detection because they reconstruct OOD too well.

5

Red-Portal t1_jakr3yf wrote

The fundamental problem with evolutionary strategies is that they are a freakin nightmare to evaluate. It's basically impossible to reason about their mathematical properties, experiments are noisy as hell, and how representative are the benchmark objective functions anyway? It's just really hard to do good science with those, which means it's hard to make concrete improvement. Sure, once upon a time they were the only choice for noisy, gradient free global optimization problems. But now we have Bayesian optimization.

2

Red-Portal t1_j9fqeq2 wrote

Depends on the area of focus. If you're a Bayesian machine learning, statistical learning, optimization person, AISTATS is the way to go. It's not just about prestige, it's just a better experience. The review is less noisy, the venue itself is more focused. It just feels like home. If you're more of an AI person than ML, than AAAI is probably more suited.

2

Red-Portal t1_j994qi0 wrote

AISTATS tend to be more popular these days, probably due to the conference timing. If you don't want to submit to AAAI, AISTATS is the other option. Also, the review process is much less noisy due to the better focus, and you get 5 reviews in general. In terms of content, they have slightly different flavors. Traditionally, people doing Bayesian nonparametrics have favored UAI, and it still somewhat seems to be the case.

2

Red-Portal t1_j7ixd0h wrote

High dimensionality does not necessarily mean more complex. In fact, it has been known for quite a while that going to higher dimensions makes various problems easier; non-linearly separable datasets suddenly become separable in higher dimensions for example. Turning this to 11, you basically get kernel machines. Kernels embed the data into potentially infinite dimensional spaces, and that has been very successful before deep learning took over.

3

Red-Portal t1_j6efht6 wrote

That's a more recent trend. Until the late 2000s, computer vision was basically combining machine learning techniques with image processing: Design filters to extract features, and slap them into a classifier. Naturally, lots of Fourier, wavelets, and other weird bases. Very different times.

8

Red-Portal t1_j6edkjk wrote

One of the bull's eye contributions of signal processing to deep learning was this paper. From a signal processing perspective, naive pooling is obviously problematic because you're decimating without limiting the signal bandwidth. That paper showed that in 2019. Shows how much computer vision has changed from an EE-dominant field to a CS field, where signal processing is not common knowledge.

7

Red-Portal t1_j673lux wrote

> If all your layers are on different machines connected by a high-latency internet connection, this will take a long time.

This is called model parallelism and this is exactly why you don't want to do it.... unless you're forced to do so. That is, at the scale of current large language monstrosities, the model might not fit on a single node. But other than that, model parallelism is well known to be bad, so people avoid it. Nonetheless, this is a known issue and lots of work has been done in improving data parallelism with asynchronous updates like HOGWILD! and horovod, because we know this scales better.

19

Red-Portal t1_j5w8thc wrote

Reply to comment by BigDreamx in [D] Publication Resume by BigDreamx

> Authors are allowed to post versions of their work on preprint servers such as arXiv. They are also allowed to give talks to restricted audiences on the work(s) submitted to ICML during the review. If you have posted or plan to post a non-anonymized version of your paper online before the ICML decisions are made, the submitted version must not refer to the non-anonymized version.

> ICML strongly discourages advertising the preprint on social media or in the press while under submission to ICML. Under no circumstances should your work be explicitly identified as ICML submission at any time during the review period, i.e., from the time you submit the paper to the communication of the accept/reject decisions.

Mate, it's stated on the call for papers

5

Red-Portal t1_j43wds7 wrote

You'll have a hard time finding non-ML approaches to AI, but there are still plenty of non-AI applications of ML. For example, classical topics like kernel methods, learning theory, optimization, all ML topics that are not-so AI flavored.

11

Red-Portal t1_j1vu94s wrote

I think what you're describing is similar to curriculum learning and importance sampling SGD. The former claims that there is a better order of feeding data during SGD that results in better training. But I'm not sure how scientifically grounded that line of research has become. It used to be closer to art. The latter is simple. Since some samples are more "destructive" (higher variance), sample them less often while numerically compensating for that.

1

Red-Portal t1_j1t797h wrote

Vanilla ANNs, or MLPs more specifically, are well known to be shit at extrapolating, which is what you're trying to do. There has been some works using periodic activation functions that claim to be better. Try to look for those.

3