Spiritual-Reply5896
Spiritual-Reply5896 t1_jckq519 wrote
Reply to comment by super_deap in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Lets say Linux kernel manual is embedded as memories. If we can get accurate semantic representation of the question, then we should be able to find relevant context from the memory, and use enough context to answer the question in fewer tokens compared to providing the whole Linux manual as context. If we assume that computing the attention is as fast as vector search, then its a no-brainer that retrieving only relevant context from memory is better approach than using the whole manual. Its of course a trade off between accuracy and speed/scalability, but I argue its a good tradeoff as text isn't often that information dense.
The ability to produce semantically coherent embeddings from text is the grain and salt of LLM, so why would it be any bigger problem to retrieve these memories from external / infinite database than from context window?
Im just hypothesizing with my limited knowledge, please correct me if I make stupid assumptions :)
Spiritual-Reply5896 t1_jcjz3fn wrote
Why is everyone talking about the context length, and not about some kind of memory retrieval? Is the assumption that by increasing context length we can eventually scale it to infite, thus replacing any kind of external memory?
Spiritual-Reply5896 t1_jc5s7ew wrote
Reply to [D] ChatGPT without text limits. by spiritus_dei
How is the similarity between synonyms or semantically similar sentences ensured if regex is used for retrieving the input prompts? Maybe I missed something as I skimmed over the paper, but that was the impression I got
Spiritual-Reply5896 t1_iwn1g6i wrote
How would you say it compares to FiftyOne, are your goals the same as with their project?
Spiritual-Reply5896 t1_iw8yhoi wrote
Reply to comment by uncooked-cookie in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
It gives you local improvement direction, but can we straightforwardly think about this metaphora of improvement in 3D and generalize it to thousands of dimensions?
Maybe its a little different question, but do you happen to know where to find research on this topic of generalizability of mathematical operations in interpretable geometrical dimensions to extremely high dimensions? Not looking for theory on vector spaces but on the intuitive aspects
Spiritual-Reply5896 t1_jcsq4d9 wrote
Reply to comment by 127-0-0-1_1 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Exactly, I wanted to find out whether there is some research regarding these embeddings. I really think that by efficient pruning/organization of these "memories" its possible to generate quite advanced memory. Things like embedding consistency then becomes a big player - how much does length affect the embedding, what is the optimal information content vs string size...