Spiritual-Reply5896 t1_jcsq4d9 wrote on March 19, 2023 at 7:22 AM

Reply to comment by 127-0-0-1_1 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Exactly, I wanted to find out whether there is some research regarding these embeddings. I really think that by efficient pruning/organization of these "memories" its possible to generate quite advanced memory. Things like embedding consistency then becomes a big player - how much does length affect the embedding, what is the optimal information content vs string size...

Spiritual-Reply5896 t1_jckq519 wrote on March 17, 2023 at 3:20 PM

Reply to comment by super_deap in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Lets say Linux kernel manual is embedded as memories. If we can get accurate semantic representation of the question, then we should be able to find relevant context from the memory, and use enough context to answer the question in fewer tokens compared to providing the whole Linux manual as context. If we assume that computing the attention is as fast as vector search, then its a no-brainer that retrieving only relevant context from memory is better approach than using the whole manual. Its of course a trade off between accuracy and speed/scalability, but I argue its a good tradeoff as text isn't often that information dense.

The ability to produce semantically coherent embeddings from text is the grain and salt of LLM, so why would it be any bigger problem to retrieve these memories from external / infinite database than from context window?

Im just hypothesizing with my limited knowledge, please correct me if I make stupid assumptions :)

Spiritual-Reply5896 t1_jcjz3fn wrote on March 17, 2023 at 11:48 AM

Reply to [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Why is everyone talking about the context length, and not about some kind of memory retrieval? Is the assumption that by increasing context length we can eventually scale it to infite, thus replacing any kind of external memory?

Spiritual-Reply5896 t1_jc5s7ew wrote on March 14, 2023 at 6:34 AM

Reply to [D] ChatGPT without text limits. by spiritus_dei

How is the similarity between synonyms or semantically similar sentences ensured if regex is used for retrieving the input prompts? Maybe I missed something as I skimmed over the paper, but that was the impression I got

Spiritual-Reply5896 t1_iwn1g6i wrote on November 16, 2022 at 9:13 PM

Reply to [P] Kangas V1 - Open source EDA tool for large, multimedia datasets by calebkaiser

How would you say it compares to FiftyOne, are your goals the same as with their project?

Spiritual-Reply5896 t1_iw8yhoi wrote on November 13, 2022 at 9:24 PM

Reply to comment by uncooked-cookie in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

It gives you local improvement direction, but can we straightforwardly think about this metaphora of improvement in 3D and generalize it to thousands of dimensions?

Maybe its a little different question, but do you happen to know where to find research on this topic of generalizability of mathematical operations in interpretable geometrical dimensions to extremely high dimensions? Not looking for theory on vector spaces but on the intuitive aspects