AI and it's effect on your music, your job and the future

  • Thread starter M3CHK1LLA
  • Start date
  • This site may earn a commission from merchant affiliate links like Ebay, Amazon, and others.

crushingpetal

SS.org Regular
Joined
Nov 11, 2022
Messages
1,381
Reaction score
1,938
I missed this earlier, but I'm not sure all the text of the internet quite portrays a meaningful or complete view of the world. It certainly portrays a good chunk of how online folks like to talk about the world, but if we're scraping the reddits and x/twitters of the world for a large chunk of that, we're not exactly starting from a very reliable source of "understanding" the world to begin with.

If the goal is to plausibly sound like someone who would post on the internet, that might be enough, but I don't think that satisfies a model of understanding the world.
Again, +1. You're hitting it out of the park today. (Or pick your favorite sports analogy.)
 

This site may earn a commission from merchant links like Ebay, Amazon, and others.

narad

Progressive metal and politics
Joined
Feb 15, 2009
Messages
16,833
Reaction score
31,344
Location
Tokyo
Agree to disagree on these points.

>I think we have basically all the text-of-the-internet data we're ever going to need when it comes from gleaning an >understanding of the world from raw text.

Try a thought experiment on a scenario where we grabbed the same amount of text from people in the 1970s.

I don't quite understand. People in the 1970s had human level intelligence, no?

I missed this earlier, but I'm not sure all the text of the internet quite portrays a meaningful or complete view of the world. It certainly portrays a good chunk of how online folks like to talk about the world, but if we're scraping the reddits and x/twitters of the world for a large chunk of that, we're not exactly starting from a very reliable source of "understanding" the world to begin with.

If the goal is to plausibly sound like someone who would post on the internet, that might be enough, but I don't think that satisfies a model of understanding the world.

I didn't say that we had enough text to have an understanding of the world. I said we had enough text to gain as much of an understanding of the world as we can "from raw text", i.e., as much as we're going to need from that modality. The current / next-gen of models are getting a lot of their comparatively new data from other modalities directly (audio data in the case of GPT4o, transcriptions of audio in the case of a lot of LLMs, and video data in the case of sora and some gemini models). So if you're talking about model collapse, there's this assumption that the internet keeps growing in size, AI-generated content makes up a lot of it, and we still train naively and uniformly on that web scrape. In reality, the web scrape's usefulness is not going to keep growing proportional to its size, future models will be trained on huge scrapes of data in other modalities, models will be used to score the likelihood of data coming from previous models, and data will be sampled selectively to maximize the utility to the new model.
 
Last edited:
Top
')