OpenAI harvested over million hours of YouTube content for ChatGpt-4: Report

OpenAI harvested more than a million hours of YouTube content to train its most advanced large language model – ChatGPT-4, a report has claimed.

Artificial Intelligence firms have been in a scramble to look for newer sources to train their models, having harvested most of the traditional repository of human knowledge, such as books, newspapers, and scientific databases.

Many a time such poaching of databases has run afoul of copyright laws, with OpenAI, as well as several other AI firms, facing lawsuits by writers and publishers alike.

According to a report by The New York Times, OpenAI trained its AI model through its voice recognition software Whisper. The firm's president Greg Brockman was personally involved in the collecting of videos, NYT wrote.

OpenAI spokesperson Lindsay Held told The Verge in an email that the company curates "unique" datasets for each of its models to "help their understanding of the world."

The spokesperson added that the company is also now looking into generating its own synthetic data.

Google spokesperson Matt Bryant told The Verge that the company has "seen unconfirmed reports" of OpenAI’s activity.

He said, "Both our robots.txt files and Terms of Service prohibit unauthorized scraping or downloading of YouTube content."

YouTube CEO Neal Mohan too had earlier alleged that OpenAI had used the video steamers' content to train its text-to-video AI model Sora.

Both however stopped short of expressing whether OpenAI's acts merit a legal action from the company.

Voltaire

OpenAI harvested over million hours of YouTube content for ChatGpt-4: Report

Related Posts

Commenti

Wikimedia launches dataset on Kaggle to dissuade AI scraping, ease server load

Maharashtra Health Officials and Experts Convene to Discuss Responsible Alcohol Consumption

Delhi Medical Association and Legal Services Authorities Organize Awareness Session on Medico-Legal Challenges

Zuckerberg considered wiping friends list to revive Facebook, buying Snapchat

Ex-OpenAI staffers back Musk's claim in latest filing

Trump overturns IRS DeFi tax rule, marking first pro-crypto victory in Congress

Musk wants to become 'AGI dictator,' tried to sabotage investor interest with 'fake' takeover bid: OpenAI claims in countersuit

Apple airlifts nearly 15 lakh iPhones from India to US to escape tariff

UK govt to launch Minority Report-style project that predict crimes

Meta launches Llama 4 models; super voracious, candid as Grok

Wikimedia claims it's groaning under traffic from bots scraping for AI

ChatGPT subscriptions soar in India, monetisation still fraction of the US'

OpenAI accused of training LLMs on copyrighted O'Reilly books

Tinder launches AI game to groom users for 'real life'

AI race: Chinese startup Zhipu AI launches 'faster than Deepseek' free agent