-Rizhiy-

-Rizhiy- t1_jde9tj1 wrote

5* - Couldn't stop reading it/Good quality novel information, which changed my perspective on the world

4* - Good, but wasn't completely obsessed

3* - Average, was bored in a few places

2* - Boring, most of them I didn't finish

1* - Hasn't happened yet, but probably for utter trash. I usually check the score on Goodreads before starting a book, so probably won't happen.

1

-Rizhiy- t1_jcblvqs wrote

This is a moot point. Most companies use AI research without contributing back, that is what being a business generally is, nothing new here.

They just need to admit that they are a business now and want to earn money for their own benefit, rather than "benefit all of humanity". Changing the name would be a good idea too)

11

-Rizhiy- t1_jbzfsqt wrote

> human level ai is probably worth more than all of big tech combined

What makes you say that? Where is the economic reasoning? For the vast majority of jobs human labour costs ~$10/hour, a 100T model will most likely cost much more to run. There is a lot of uncertainty with whether the current LLMs can be profitable.

I would say that actually the main reason stopping training of even larger LLMs, is that the economic model is not figured out yet.

0

-Rizhiy- t1_j7ilv1v wrote

I feel that they won't be trying to generate novel responses from the model, but rather take knowledge graph + relevant data from the first few responses and ask the model to summarise that/change into an answer which humans find appealing.

That way you don't have to rely on the model to remember stuff, it can access all required information through attention.

14

-Rizhiy- t1_j2lkgn2 wrote

Look at papers dealing with multi-modal tasks. e.g. Perceiver/Perceiver IO by DeepMind

You can encode your data into tokens with the same size using something like an MLP. Then feed these tokens into decoder along with encoder tokens. Should probably also add an learnable embedding for different types of data to prevent signal confusion.

6

-Rizhiy- t1_j1979ji wrote

Very difficult to give a definite answer without knowing more about your situation. Things to consider:

  • How big is the model you are planning to train? Many large models are limited by VRAM size rather than compute. Which means you will need to use either 4090 or professional cards like A100. Professional cards cost way more and price trade-offs become way less beneficial than gamer cards.
  • How many cards do you need? You can probably make a workstation with 4 GPUs, a server with 16 GPUs. More than that you are looking for multiserver setup.
  • Where are you located? Electricity can be a large cost component. e.g. I'm in UK and electricity is at £0.34/kWh ($0.41), which is about $1000 to run a typical GPU for a year straight.
  • Are you able to set up electricity supply where you are located. For a big server you are probably looking for a dedicated 5kW line.
  • What other components do you need? Probably need a UPS and a fast processor at least, which adds to the cost. Multiple servers will probably also require a good switch.
  • How are you going to cool it? Even a 4xGPU workstation will produce something like 1500W of heat and be rather loud. It might be fine during winter, but for summer you will probably need to install a dedicated AC.
  • Are you up to building/maintaining all that infrastructure yourself? How much is your time worth?
  • Finally, explore all options. AWS has spot and reserved instances, which can be much cheaper than on-demand. LambdaLabs offer cheaper GPUs than AWS. Other cloud providers might have start-up discounts/funds.

P.S. I work for AWS, so probably have a bias.

2

-Rizhiy- t1_j10nstz wrote

Can you collect more data similar to hard examples?

People like to focus on the architecture or training techniques, but most real problems can be solved by collecting more relevant data.

If the loss remains high even after getting more data, two potential problems come to mind:

  • There is not enough information in your data to correctly predict the target.
  • Your model is not complex/expressive enough to properly estimate the target.
13