nikola-b
nikola-b t1_j9mdw5s wrote
Reply to [D] Faster Flan-T5 inference by _learn_faster_
Might not be what you want, but you can use our hosted flan-t5 models at deepinfra.com. This way you can just call them as API. Even flan-t5-xxl. Disclaimer I work at Deep Infra.
nikola-b t1_j9hk5q4 wrote
Reply to comment by tyras_ in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Free for now, we have not added the payment workflow. In the future, you are billed only for the inference time, so with 1h you should be able to generate lots of tokens. Also I added EleutherAI/gpt-neo-2.7B and EleutherAI/gpt-j-6B if the op wants to try them.
nikola-b t1_j9crv4u wrote
Reply to comment by DevarshTare in [D] Simple Questions Thread by AutoModerator
I would more memory is more important. Buy the 3060 with the 12GB. If you have more money get the 3090 24GB. The memory is more important in my view because it will allow you to run bigger models.
nikola-b t1_j9cqkys wrote
Reply to [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Not sure if this helps, but you can use our hosted flan-t5 model at deepinfra.com using HTTP API. It's free for now. Disclaimer I work at deepinfra. If you want GPT-Neo or GPT-J I can deploy those also.
nikola-b t1_j9ujdux wrote
Reply to comment by tyras_ in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
There was auth bug in the code. Sorry for that. Please try again now.