Business-Lead2679

Business-Lead2679 OP t1_jecfagu wrote on March 31, 2023 at 12:09 AM

Reply to comment by Rei1003 in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679

The main point of these open-source 10b models is to make them fit on an average consumer hardware, while still providing great performance, even offline. A 100b model is hard to train because of it's size, and even harder to maintain on a server that is powerful enough to handle multiple requests at the same time, while providing good response generation speed. Not to mention how expensive this can be to run. When it comes to 1b models, they usually do not achieve a good performance, as they do not have enough data. Some models with this size are good, yes, but a 10b model is usually significantly better, if trained correctly, and can still fit on a consumer hardware.

Business-Lead2679 OP t1_je9erdj wrote on March 30, 2023 at 11:33 AM

Reply to comment by Justice43 in [D] Training a 65b LLaMA model by Business-Lead2679

Just checked it out - looks interesting. Unfortunately, the availability of this instance is quite limited, so I'm not sure if I can get access to it

Business-Lead2679 OP t1_je7aefg wrote on March 29, 2023 at 10:40 PM

Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679

Just like Alpaca. Even the JSON format is the same as the one released by Stanford, just with different inputs & outputs

Business-Lead2679 OP t1_je794o8 wrote on March 29, 2023 at 10:31 PM

Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679

I tried vast.ai which didn’t work. I’m a newbie so maybe I’m doing something wrong

Business-Lead2679 OP t1_je792jz wrote on March 29, 2023 at 10:31 PM

Reply to comment by WarProfessional3278 in [D] Training a 65b LLaMA model by Business-Lead2679

Finetuning

Business-Lead2679 OP t1_je70nka wrote on March 29, 2023 at 9:31 PM

Reply to [D] Training a 65b LLaMA model by Business-Lead2679

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024