Skip to main content
  1. Home
  2. Computing
  3. News

Nvidia reportedly caught scraping AI data from Netflix and YouTube (again)

Add as a preferred source on Google
Nvidia CEO Jensen in front of a background.
Nvidia

According to a damning report from 404 Media, backed with internal Slack chats, emails, and documents obtained by the outlet, Nvidia helped itself to “a human lifetime visual experience worth of training data per day,” Ming-Yu Liu, vice president of Research at Nvidia and a Cosmos project leader, admitted in a May email.

Unnamed former Nvidia employees told 404 that they had been asked to scrape video content from Netflix, YouTube, and other online sources in order to obtain training data for use with the company’s various AI products. Those include Nvidia’s Omniverse 3D world generator, self-driving car systems, and “digital human.”

Recommended Videos

When those employees asked about the legality of the project, internally named Cosmos, they were assured by management that they had been given clearance by the highest levels of the company to use that content.

The project sought to build a foundation model, akin to Gemini 1.5, GPT-4, or Llama 3.1, “that encapsulates simulation of light transport, physics, and intelligence in one place to unlock various downstream applications critical to Nvidia.”

To do this, project Cosmos allegedly used an open-source video downloader and employed machine learning to IP hop, thereby avoiding YouTube’s attempts to block it. According to emails viewed by 404, project managers discussed using as many as 30 virtual machines running on Amazon Web Services to download 80 years’ worth of full-length and clip-length videos every day.

For its part, Nvidia claims no wrongdoing. “We respect the rights of all content creators and are confident that our models and our research efforts are in full compliance with the letter and the spirit of copyright law,” an Nvidia spokesperson told 404 Media via email. “Copyright law protects particular expressions but not facts, ideas, data, or information. Anyone is free to learn facts, ideas, data, or information from another source and use it to make their own expressions. Fair use also protects the ability to use a work for a transformative purpose, such as model training.”

This is far from the first time that Nvidia (not to mention a vast majority of the rest of the AI field) has taken a “scrape first and maybe ask forgiveness later” approach to its AI training efforts. In July, Nvidia was named in another report on illegal scraping of copyrighted videos alongside Anthropic and Salesforce.

At CES 2024, the company set off an internet firestorm with its ambiguous answers as to how its new generative AI for gaming engine was trained. In response, Nvidia reiterated that its tools were “commercially safe.”

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
The Android Show 2026: Gemini Intelligence, Googlebook, Android 17 updates, and everything else
Gemini Intelligence, Googlebooks, Android 17, and redesigned Android Auto. Google didn't hold back at its pre-I/O show, and the main event is still a week away.
The Android Show 2026

Every year, Google front-loads its Android announcements in a separate pre-show the week before its annual I/O conference. This year, the company did exactly that, and The Android Show: I/O Edition was anything but a warmup act. 

Google showed up well prepared, with plenty of software and a major hardware announcement that took everyone by surprise. One by one, let's talk about everything, including a deeply integrated AI overhaul, a long-overdue security upgrade, an Android Auto makeover that feels like it was designed for 2026, and a brand-new laptop category. 

Read more
Google just announced a new kind of laptop, and it puts Gemini everywhere
Google's new Googlebook platform puts Gemini at the center of every laptop interaction, from the cursor to the desktop, with devices from major PC makers arriving this fall.
Googlebook

Google wants Gemini to be the brain of your next laptop, and the company has announced a whole new category to make that happen. Dubbed Googlebook, the new laptop platform puts Gemini at the center of the experience, with devices from Acer, Asus, Dell, HP, and Lenovo expected this fall.

What makes it different

Read more
Google just made Gemini for Home a lot better at running your smart home
Google just updated Gemini for Home with smarter features and faster controls.
Google-gemini-for-home-updates

If you have a Google smart display or speaker at home, there are new updates you should know about. Google has rolled out a fresh batch of improvements to Gemini for Home, making the assistant noticeably smarter and faster across smart speakers and displays.

Gemini for Home is getting smarter and more personal

Read more