Deep-Station-1746
Deep-Station-1746 t1_je8u12c wrote
Reply to [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
This is interesting - compared to LoRa, it allows LLaMA to also accept images as inputs. And, I believe it is orthogonal to using LoRa. Meaning, they possibly can be used together. I'm unsure about the training stability though. I know that LoRa training allows ridiculously high learning rates (1e-5 for Text encoder), especially for dreambooth. Using LoRa for the frozen weights + LLaMA adapter is an interesting thing to explore.
Edit: spelling
Deep-Station-1746 t1_jdzlqrw wrote
Reply to [D] With ML tools progressing so fast, what are some ways you've taken advantage of them personally? by RedditLovingSun
Oh I've used it to <insert something I previously used to do to do with google>. It's great.
Deep-Station-1746 t1_jdz5z9w wrote
Reply to [D] Is French the most widely used language in ML circles after English? If not, what are some useful (natural) languages in the field of machine learning? by Subject_Ad_9680
Pythonese is quite useful, from what I hear. Especially the Torchese dialect.
Deep-Station-1746 t1_jduxvm4 wrote
Reply to comment by [deleted] in [D] Build a ChatGPT from zero by manuelfraile
You know what? provide me with a 2-3 sample "good" responses to the above post, and explain why they make for a better response than what I wrote, and I'll actually use them from now on to respond to low-effort posts from this sub.
Deep-Station-1746 t1_jduhmbg wrote
Reply to [D] Definitive Test For AGI by jabowery
Sir, this is r/MachineLearning. May I take your quality contribution?
Deep-Station-1746 t1_jduhbth wrote
Reply to [D] Build a ChatGPT from zero by manuelfraile
OP is peaking on Dunning-Kruger curve right now.
Deep-Station-1746 t1_jdn3vxg wrote
Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai
If you say so.
Deep-Station-1746 t1_jdlmrh5 wrote
Reply to comment by learn-deeply in [R] Reflexion: an autonomous agent with dynamic memory and self-reflection - Noah Shinn et al 2023 Northeastern University Boston - Outperforms GPT-4 on HumanEval accuracy (0.67 --> 0.88)! by Singularian2501
This is actually a very good PR material, as it will save engineers' time. Just opened it and referenced your comment. https://github.com/noahshinn024/reflexion-human-eval/pull/1
Deep-Station-1746 t1_jdhhbbg wrote
Reply to [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
Nope. Ability to input something doesn't mean being able to use it reliably. For example, take this post - your eyes have an ability to input all the info on the screen, but as a contribution, this post is pretty worthless. And, you are a lot smarter than GPT-4, I think.
Edit: spelling
Deep-Station-1746 t1_jcamy6n wrote
Reply to comment by OptimizedGarbage in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
Patenting a dropout feels a lot like NFTs - it's useless. So why bother?
Edit:
What I don't understand is how can anyone prove that someone is multiplying together matrices in some way as long as they don't admit to that themselves.
That's like someone patenting a thought. If you think about a particular patented pair of pants™, can you be sued for propagating a patented neural activity through your bio network? It's absurd.
Deep-Station-1746 t1_jc196cg wrote
>25% improvement over Whisper
>Not open source
>doubt.jpeg
Deep-Station-1746 t1_j91egc2 wrote
Reply to [D] Please stop by [deleted]
Isn't this kind of high-quantity-low-quality trend inevitable after some threshold popularity of the base topic? Is there any reason to try to fight the inevitable, instead of forming more niche, less popular communities?
Deep-Station-1746 t1_j146uw3 wrote
Reply to comment by Deep-Station-1746 in Reduce paramter count in an NN without sacrificing performance [P] by ackbladder_
The laziest option is fp16 quantization. As easy as model.half() on most torch-based models. Halves the physical size of the model. You could also try knowledge distillation (read up on how distilbert was made, for example). You can also do stuff that is more arch-specific, like if you have a transformer, you could use xformers efficient attention for example. The list goes on and on.
Deep-Station-1746 t1_j146dhq wrote
> reduce excess bulk in a NN without sacrificing performance
Simply put, that is not possible. There's literally always a trade-off. So, the question is, what are you willing to sacrifice? How much performance are you willing to forego?
Deep-Station-1746 t1_j12m6fe wrote
Reply to [R] PyTorch implementation of Forward-Forward Algorithm by Geoffrey Hinton and analysis of performances over backpropagation by galaxy_dweller
Looking forward to updates. :)
Deep-Station-1746 t1_j0ylg2a wrote
Go for vast.ai if you don't have a huge budget. You could rent a 24GB vram instance for 0.4$/hr.
Deep-Station-1746 t1_j0t6yad wrote
> replacement for Machine Learning Drama Twitter
FTFY
Deep-Station-1746 t1_j0m0var wrote
Deep-Station-1746 t1_j06ku87 wrote
Reply to comment by TensorDudee in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee
Everything is slow and hard to implement on tensorflow, without much redeemable excuses either (compared to JAX e.g.).
Deep-Station-1746 t1_j06kayz wrote
Solid work. This reminds me of that internet explorer meme.
Deep-Station-1746 t1_izrr83v wrote
If you are looking to maximize the TFLOP per $, just use vast.ai. Unless you got enterprise-level VRAM needs, vast will likely be much cheaper than anything a cloud provider lists.
Deep-Station-1746 t1_izpfe8i wrote
Reply to [D] A talk about ChatGPT by [deleted]
Correct me if I'm wrong, but this post feels like a thinly-veiled subreddit promotion, and nothing else. :)
Deep-Station-1746 t1_izo734l wrote
Reply to [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
Is this just rewording the TypeError's str description? What is the information context for the ChatGPT?
Deep-Station-1746 t1_je8yb60 wrote
Reply to [D] What do you think about all this hype for ChatGPT? by Dear-Vehicle-3215
nausea, noun. a feeling of loathing or disgust.