Abradolf--Lincler

Abradolf--Lincler t1_jc8ynrt wrote

Learning about language transformers and I’m a bit confused.

It seems like the tutorials on transformers always make input sequences (ie. Text files batched to 100 words per window) the same length to help with batching.

Doesn’t that mean that the model will only work with that exact sequence length? How do you efficiently train a model to work with any sequence length, such as shorter sequences with no padding and longer sequences than the batched sequence length?

I see attention models advertised as having an infinite window, are there any good resources/tutorials to explain how to make a model like this?

1

Abradolf--Lincler t1_itoo2lh wrote

I am using pointnet

I have a point cloud segmentation problem. In my training data, I have 1 class, but on average only ~4% of all points per point cloud are of that class, and are usually found grouped together (same object).

How do I balance this?

If I remove most points that aren't in the class, then the point cloud will become sparse and it would be too easy to spot where the class is, since only ~8% of points will remain.

Or is there a way to train this well without balancing the training data?

Thanks!

1