-pkomlytyrg
-pkomlytyrg t1_jeeeo36 wrote
Reply to comment by --dany-- in [D] Best deal with varying number of inputs each with variable size using and RNN? (for an NLP task) by danilo62
I would embed the whole post (BigBird or OpenAI embeddings have really long context lengths), and just feed that vector into an RNN. As long as the post is between one and 9000 tokens, the embedding shape will remain the same
-pkomlytyrg t1_jef4weq wrote
Reply to comment by danilo62 in [D] Best deal with varying number of inputs each with variable size using and RNN? (for an NLP task) by danilo62
Generally, yes. If you use a model with a long context length (BigBird or OpenAI’s ada02), you’ll likely be fine unless the articles you’re embedding are greater than the token limit. If your using BERT or another, smaller model, you have to chunk/average; that can produce fixed sized vectors but you gotta put the work in haha