olmec-akeru

olmec-akeru t1_jck223w wrote

Yeah, totally right—and I understand that the specifics really matter in some cases (for example calculating a starship trajectory).

What intrigues me, is that in ideas of concept, of logic, this specific error isn't meaningful. i.e. if the sum of three primes was initially correct the approach wouldn't be invalid. There is something in this.

1

olmec-akeru t1_jcjw333 wrote

Right, so ignoring the specific error and thinking about the general approach: adding a^3 is a fourth term; and it happens that a = 0.

Sneaky, but not illogical.

Edit: the above is wrong, read the thread below for OPs insights.

1

olmec-akeru OP t1_iy8ajq0 wrote

> the beauty of the PCA reduction was that one dimension was responsible for the size of the nose

You posit that an eigenvector will represent the nose when there are meaningful variations of scale, rotation, and position?

This is very different to saying all variance will be explained across the full set of eigenvectors (which very much is true).

1

olmec-akeru OP t1_iy7i546 wrote

Heya! Appreciate the discourse, its awesome!

As a starting point, I've shared the rough description from wikipedia on the t-SNE algorithm:

> The t-SNE algorithm comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects are assigned a higher probability while dissimilar points are assigned a lower probability. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback–Leibler divergence (KL divergence) between the two distributions with respect to the locations of the points in the map. While the original algorithm uses the Euclidean distance between objects as the base of its similarity metric, this can be changed as appropriate.

So the algorithm is definitely trying to minimise the KL divergence. In trying to minimise the KLD between the two distributions it is trying to find a mapping such that dissimilar points are further apart in the embedding space.

2

olmec-akeru OP t1_iy7ai6s wrote

>beauty of the PCA reduction was that one dimension was responsible for the size of the nose

I don't think this always holds true. You're just lucky that your dataset contains confined variation such that the eigenvectors represent this variance to a visual feature. There is no mathematical property of PCA that makes your statement true.

There have been some attempts to formalise something like what you have described. The closest I've seen is the beta-VAE: https://lilianweng.github.io/posts/2018-08-12-vae/

2