-Rizhiy-
-Rizhiy- t1_jck6j55 wrote
Reply to comment by Spiritual-Reply5896 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Increasing the context window is a simple albeit costly method of increasing amount of addressable information. Working with external memory is not as straightforward.
-Rizhiy- t1_jce09xx wrote
Reply to comment by 1F9 in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]
There is a reason it is called PyTorch)
-Rizhiy- t1_jcblvqs wrote
Reply to [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
This is a moot point. Most companies use AI research without contributing back, that is what being a business generally is, nothing new here.
They just need to admit that they are a business now and want to earn money for their own benefit, rather than "benefit all of humanity". Changing the name would be a good idea too)
-Rizhiy- t1_jbzfsqt wrote
Reply to comment by f_max in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
> human level ai is probably worth more than all of big tech combined
What makes you say that? Where is the economic reasoning? For the vast majority of jobs human labour costs ~$10/hour, a 100T model will most likely cost much more to run. There is a lot of uncertainty with whether the current LLMs can be profitable.
I would say that actually the main reason stopping training of even larger LLMs, is that the economic model is not figured out yet.
-Rizhiy- t1_j7ilv1v wrote
Reply to comment by Sirisian in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
I feel that they won't be trying to generate novel responses from the model, but rather take knowledge graph + relevant data from the first few responses and ask the model to summarise that/change into an answer which humans find appealing.
That way you don't have to rely on the model to remember stuff, it can access all required information through attention.
-Rizhiy- t1_j2lkgn2 wrote
Reply to [D] What are good ways of incorporating non-sequential context into a transformer model? by abc220022
Look at papers dealing with multi-modal tasks. e.g. Perceiver/Perceiver IO by DeepMind
You can encode your data into tokens with the same size using something like an MLP. Then feed these tokens into decoder along with encoder tokens. Should probably also add an learnable embedding for different types of data to prevent signal confusion.
-Rizhiy- t1_j1979ji wrote
Very difficult to give a definite answer without knowing more about your situation. Things to consider:
- How big is the model you are planning to train? Many large models are limited by VRAM size rather than compute. Which means you will need to use either 4090 or professional cards like A100. Professional cards cost way more and price trade-offs become way less beneficial than gamer cards.
- How many cards do you need? You can probably make a workstation with 4 GPUs, a server with 16 GPUs. More than that you are looking for multiserver setup.
- Where are you located? Electricity can be a large cost component. e.g. I'm in UK and electricity is at £0.34/kWh ($0.41), which is about $1000 to run a typical GPU for a year straight.
- Are you able to set up electricity supply where you are located. For a big server you are probably looking for a dedicated 5kW line.
- What other components do you need? Probably need a UPS and a fast processor at least, which adds to the cost. Multiple servers will probably also require a good switch.
- How are you going to cool it? Even a 4xGPU workstation will produce something like 1500W of heat and be rather loud. It might be fine during winter, but for summer you will probably need to install a dedicated AC.
- Are you up to building/maintaining all that infrastructure yourself? How much is your time worth?
- Finally, explore all options. AWS has spot and reserved instances, which can be much cheaper than on-demand. LambdaLabs offer cheaper GPUs than AWS. Other cloud providers might have start-up discounts/funds.
P.S. I work for AWS, so probably have a bias.
-Rizhiy- t1_j10nstz wrote
Reply to [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan
Can you collect more data similar to hard examples?
People like to focus on the architecture or training techniques, but most real problems can be solved by collecting more relevant data.
If the loss remains high even after getting more data, two potential problems come to mind:
- There is not enough information in your data to correctly predict the target.
- Your model is not complex/expressive enough to properly estimate the target.
-Rizhiy- t1_j030en1 wrote
Reply to comment by SherbertTiny2366 in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar
Thank you, that makes sense.
-Rizhiy- t1_j018jx5 wrote
Reply to comment by SherbertTiny2366 in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar
Do you by any chance have a resource that explains that a bit more?
I can't get my head around how a collection of accurate forecasts, can produce an inaccurate aggregate.
Is it related to class imbalances or perhaps something like Simpson's paradox?
-Rizhiy- t1_jde9tj1 wrote
Reply to How do you rate your books on Goodreads? by pensieve64
5* - Couldn't stop reading it/Good quality novel information, which changed my perspective on the world
4* - Good, but wasn't completely obsessed
3* - Average, was bored in a few places
2* - Boring, most of them I didn't finish
1* - Hasn't happened yet, but probably for utter trash. I usually check the score on Goodreads before starting a book, so probably won't happen.