It was not trained on basically the entire internet. Not even close. Even if they trained it on all the pages Google has indexed, that's not even close to the entire internet, and I'm not even talking about the dark web. Toss in all the data behind user accounts, paywalls, intranets. Then toss on all the audio and video on all the social media and audio/video platforms and OpenAI couldn't afford to train, much less optimize, much less host a model of that size.
ChingChong--PingPong t1_jdwfooc wrote
Reply to comment by BellyDancerUrgot in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
It was not trained on basically the entire internet. Not even close. Even if they trained it on all the pages Google has indexed, that's not even close to the entire internet, and I'm not even talking about the dark web. Toss in all the data behind user accounts, paywalls, intranets. Then toss on all the audio and video on all the social media and audio/video platforms and OpenAI couldn't afford to train, much less optimize, much less host a model of that size.