Deep-Station-1746

Deep-Station-1746 t1_je8u12c wrote

This is interesting - compared to LoRa, it allows LLaMA to also accept images as inputs. And, I believe it is orthogonal to using LoRa. Meaning, they possibly can be used together. I'm unsure about the training stability though. I know that LoRa training allows ridiculously high learning rates (1e-5 for Text encoder), especially for dreambooth. Using LoRa for the frozen weights + LLaMA adapter is an interesting thing to explore.

Edit: spelling

10

Deep-Station-1746 t1_jdhhbbg wrote

Nope. Ability to input something doesn't mean being able to use it reliably. For example, take this post - your eyes have an ability to input all the info on the screen, but as a contribution, this post is pretty worthless. And, you are a lot smarter than GPT-4, I think.

Edit: spelling

−19

Deep-Station-1746 t1_jcamy6n wrote

Patenting a dropout feels a lot like NFTs - it's useless. So why bother?

Edit:

What I don't understand is how can anyone prove that someone is multiplying together matrices in some way as long as they don't admit to that themselves.

That's like someone patenting a thought. If you think about a particular patented pair of pants™, can you be sued for propagating a patented neural activity through your bio network? It's absurd.

−13

Deep-Station-1746 t1_j91egc2 wrote

Isn't this kind of high-quantity-low-quality trend inevitable after some threshold popularity of the base topic? Is there any reason to try to fight the inevitable, instead of forming more niche, less popular communities?

36

Deep-Station-1746 t1_j146uw3 wrote

The laziest option is fp16 quantization. As easy as model.half() on most torch-based models. Halves the physical size of the model. You could also try knowledge distillation (read up on how distilbert was made, for example). You can also do stuff that is more arch-specific, like if you have a transformer, you could use xformers efficient attention for example. The list goes on and on.

6