Skip to main content
  1. Home
  2. Computing
  3. News

Nvidia reportedly caught scraping AI data from Netflix and YouTube (again)

Add as a preferred source on Google
Nvidia CEO Jensen in front of a background.
Nvidia

According to a damning report from 404 Media, backed with internal Slack chats, emails, and documents obtained by the outlet, Nvidia helped itself to “a human lifetime visual experience worth of training data per day,” Ming-Yu Liu, vice president of Research at Nvidia and a Cosmos project leader, admitted in a May email.

Unnamed former Nvidia employees told 404 that they had been asked to scrape video content from Netflix, YouTube, and other online sources in order to obtain training data for use with the company’s various AI products. Those include Nvidia’s Omniverse 3D world generator, self-driving car systems, and “digital human.”

Recommended Videos

When those employees asked about the legality of the project, internally named Cosmos, they were assured by management that they had been given clearance by the highest levels of the company to use that content.

The project sought to build a foundation model, akin to Gemini 1.5, GPT-4, or Llama 3.1, “that encapsulates simulation of light transport, physics, and intelligence in one place to unlock various downstream applications critical to Nvidia.”

To do this, project Cosmos allegedly used an open-source video downloader and employed machine learning to IP hop, thereby avoiding YouTube’s attempts to block it. According to emails viewed by 404, project managers discussed using as many as 30 virtual machines running on Amazon Web Services to download 80 years’ worth of full-length and clip-length videos every day.

For its part, Nvidia claims no wrongdoing. “We respect the rights of all content creators and are confident that our models and our research efforts are in full compliance with the letter and the spirit of copyright law,” an Nvidia spokesperson told 404 Media via email. “Copyright law protects particular expressions but not facts, ideas, data, or information. Anyone is free to learn facts, ideas, data, or information from another source and use it to make their own expressions. Fair use also protects the ability to use a work for a transformative purpose, such as model training.”

This is far from the first time that Nvidia (not to mention a vast majority of the rest of the AI field) has taken a “scrape first and maybe ask forgiveness later” approach to its AI training efforts. In July, Nvidia was named in another report on illegal scraping of copyrighted videos alongside Anthropic and Salesforce.

At CES 2024, the company set off an internet firestorm with its ambiguous answers as to how its new generative AI for gaming engine was trained. In response, Nvidia reiterated that its tools were “commercially safe.”

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Apple’s M6 chip isn’t even here yet, but you’ll see M7 Macs early in 2027
Apple is reportedly already accelerating its next-generation silicon roadmap, even before the M6 has launched.
Apple MacBook

The M6 chip is still expected to debut later this year, but Apple may already be preparing for what comes next. According to Mark Gurman's latest report for Bloomberg, the company is aiming to introduce its first M7-powered devices as early as the first half of 2027, hinting at a much faster silicon refresh than many expected.

M7 could arrive alongside new Macs and iPads

Read more
The entry-level MacBook Pro could get a design refresh in 2027, and it’s about time
Five years on the same chassis, and now both tiers of the MacBook Pro are getting a new look at once.
MacBook Pro in space grey sitting on a desk.

Apple has a new MacBook Pro lined up for launch early next year, according to Bloomberg. The company will introduce a 14-inch laptop in the first half of 2027. 

The biggest surprise, however, will be a brand-new design language. The outlet describes it as "a revamped entry-level MacBook Pro, code-named K104."

Read more
Study finds humans will talk to AI ghosts of the dead as reincarnations, and it’s pretty grim
The first AI ghost study is in. The results are about as complicated as you'd expect.
VR Headset, Person, Face

A new study from the University of Colorado Boulder confirms something that sounds both impressive and concerning. People find interacting with AI simulations of their dead loved ones deeply meaningful, and most will come away wanting to do it again.

The researchers call it a "generative ghost," which is a clear reference to generative AI, but I’d still prefer to call it unsettling.

Read more