nikola-b t1_j9ujdux wrote on February 24, 2023 at 5:38 PM

Reply to comment by tyras_ in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics

There was auth bug in the code. Sorry for that. Please try again now.

nikola-b t1_j9mdw5s wrote on February 23, 2023 at 12:33 AM

Reply to [D] Faster Flan-T5 inference by _learn_faster_

Might not be what you want, but you can use our hosted flan-t5 models at deepinfra.com. This way you can just call them as API. Even flan-t5-xxl. Disclaimer I work at Deep Infra.

nikola-b t1_j9hk5q4 wrote on February 22, 2023 at 12:45 AM

Reply to comment by tyras_ in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics

Free for now, we have not added the payment workflow. In the future, you are billed only for the inference time, so with 1h you should be able to generate lots of tokens. Also I added EleutherAI/gpt-neo-2.7B and EleutherAI/gpt-j-6B if the op wants to try them.

nikola-b t1_j9crv4u wrote on February 20, 2023 at 11:42 PM

Reply to comment by DevarshTare in [D] Simple Questions Thread by AutoModerator

I would more memory is more important. Buy the 3060 with the 12GB. If you have more money get the 3090 24GB. The memory is more important in my view because it will allow you to run bigger models.

nikola-b t1_j9cqkys wrote on February 20, 2023 at 11:32 PM

Reply to [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics

Not sure if this helps, but you can use our hosted flan-t5 model at deepinfra.com using HTTP API. It's free for now. Disclaimer I work at deepinfra. If you want GPT-Neo or GPT-J I can deploy those also.