How AI models are grabbing the world's data

How AI models are grabbing the world's data | GZERO AI

What does it take to build AI? Human labor, natural resources, and-most significantly-an insane amount of data! But how are tech giants like Meta and Google collecting this data? In this episode of GZERO AI, host Taylor Owen examines the scale and implications of the historic data land grab happening in the AI sector.
Subscribe to GZERO on KZread and turn on notifications (🔔): / @gzeromedia
Sign up for GZERO Daily (free newsletter on global politics): rebrand.ly/gzeronewsletter
In this episode of GZERO AI, Taylor Owen, host of the Machines Like Us podcast, examines the scale and implications of the historic data land grab happening in the AI sector. According to researcher Kate Crawford, AI is the largest superstructure ever built by humans, requiring immense human labor, natural resources, and staggering amounts of data. But how are tech giants like Meta and Google amassing this data?
So AI researcher Kate Crawford recently told me that she thinks that AI is the largest superstructure that our species has ever built. This is because of the enormous amount of human labor that goes into building AI, the physical infrastructure that's needed for the compute of these AI systems, the natural resources, the energy and the water that goes into this entire infrastructure. And of course, because of the insane amounts of data that is needed to build our frontier models. It's increasingly clear that we're in the middle of a historic land grab for these data, essentially for all of the data that has ever been created by humanity. So where is all this data coming from and how are these companies getting access to it? Well, first, they're clearly scraping the public internet. It's safe to say that if anything you've done has been posted to the internet in a public way, it's inside the training data of at least one of these models.
But it's also probably the case that these scraping includes a large amount of copyrighted data, or not publicly necessarily available data. They're probably also getting behind paywalls as we'll find out soon enough as the New York Times lawsuit against OpenAI works its way through the system and they're scraping each other's data. According to the New York Times, Google found out that OpenAI was scraping KZread, but they didn't reveal it or push or reel it to the public because they too were scraping all of KZread themselves and didn't just want this getting out. Second, all these companies are purchasing or licensing data. This includes news licensing entering into agreements with publishers, data purchased from data brokers, purchasing companies, or getting access to company datas that have rich data sets. Meta, for example, was considering buying the publisher Simon and Schuster just for access to their copyrighted books in order to train their LLM.
Read more:
Want to know more about global news and why it matters? Follow us on:
Instagram: / gzeromedia
Twitter: / gzeromedia
TikTok: / gzeromedia
Facebook: / gzeromedia
LinkedIn: / gzeromedia
Threads: threads.net/@gzeromedia
Subscribe to our KZread channel and turn on notifications (🔔): / @gzeromedia
Sign up for GZERO Daily (free newsletter on global politics): rebrand.ly/gzeronewsletter
Subscribe to the GZERO podcast: podcasts.apple.com/us/podcast...
GZERO Media is a multimedia publisher providing news, insights and commentary on the events shaping our world. Our properties include GZERO World with Ian Bremmer, our newsletter GZERO Daily, Puppet Regime, the GZERO World Podcast, In 60 Seconds and GZEROMedia.com
#GZEROAI #AI #Data

Пікірлер: 11

@tempusfugit363516 күн бұрын
Appreciate the effort but I think this was a bit topical. Would have liked a bit more depth, esp. around current ways companies are trying to break through data availability bottlenecks. We hear them talk about how they’ve hoovered up almost all the available high quality training data. So, what next? And how does that create political pressures (your core reporting competencies).
@brandonreed0914 күн бұрын
You actually can opt out of the data scraping. That's why the AI companies are paying Reddit for their data, because they blocked the web scraping. Years ago Google set up a standard opt out for webcrawling and webscraping. Even with that people may still try to do it. You can also use various service providers that will monitor your web traffic and block high volume traffic, which is likely webscraping.
@reginafefifofina16 күн бұрын
0:48 Cookies 🍪 🥠🍪Acceptance policies?
@jmhorange16 күн бұрын
I'd love to see a journalistic piece on how much all the data collected will cost tech companies. Just a ballpark figure. All the tech companies say they can't afford to pay for all the data they need for their AI models. And yet they provide no proof of the costs. A lot of the court cases and talk around regulations rest on the assumption that tech companies can afford to pay for all the data they use. Well we don't have a ballpark number for the costs of the data so we don't know if that assumption is true. We could be talking trillions here so it's a big deal. And we don't have unlimited time. The more AI is integrated into society, the more likely governments will be forced to legalize the use of data for free to protect the AI based economy and no one might get compensated for their data. We all helped create the AI models, but it seems currently we won't get any of the fruit of that labor. The rich get richer and the poor will get poorer. I think this is an area journalists could do research on, because I've seen nothing on it.
@DiegoMarquesBrazil16 күн бұрын
What for exactly?
@__________5737
16 күн бұрын
You need language data to train language models and videos to train video models
@LIV-FREE-VET12 күн бұрын
🤔
@user-fx7li2pg5k14 күн бұрын
I give them my axiolgoy and philosophy methodology everything I have lmao ontology epistemology
@kshen748516 күн бұрын
AI can solve some problems, but it can’t solve our social problems. We still urgently need a new political system to stop the dramatic western decline.
@WalterBurton16 күн бұрын
😬👎

How AI models are grabbing the world's data | GZERO AI

Пікірлер: 11

@__________5737

16 күн бұрын

Келесі