NVIDIA's AI team reportedly extracted videos from YouTube, Netflix without permission

[ad_1]

In the latest example of a troubling industry pattern, NVIDIA appears to have a large amount of copyrighted AI training content. Monday, 404 Media’s Samantha Cole reported that the $2.4 trillion company asked employees to download videos from YouTube, Netflix and other data sets to develop commercial AI projects. The graphics card maker is among the tech companies that appear to be adopting a “move fast and break things” ethos as they race for dominance in this feverish, often embarrassing AI gold rush.

The training was reportedly to develop models for products such as its Omniverse 3D world generator, self-driving car systems and “digital human” efforts.

NVIDIA defended its actions in an email to Engadget. A company spokesperson said its research “fully complies with the letter and spirit of copyright law” while saying IP laws protect certain expressions “but not facts, opinions, data, or information.” The company equated this practice with a person’s right to “read facts, opinions, data, or information from another source and use them to express their own opinion.” Human, computer… what’s the difference?

YouTube doesn’t seem to agree. Speaker Jack Malon showed us a Bloomberg story from April, citing CEO Neal Mohan as saying that using YouTube to train AI models would be a “clear violation” of its terms. “Our previous comments still stand,” YouTube’s policy communications manager wrote to Engadget.

That quote from Mohan in April was in response to reports that OpenAI trained its text-to-video generator for Sora in YouTube videos without permission. Last month, a report indicated that Runway AI startup followed suit.

NVIDIA employees who have raised ethical and legal concerns about the practice have reportedly been told by their superiors that it has been green-lit by the company’s highest levels. “This is a top decision,” replied Ming-Yu Liu, vice president of research at NVIDIA. “We have umbrella permission for all data.” Others in the company are said to have described its cancellation as an “open legal issue” that they will address down the road.

It all sounds a lot like Facebook’s (Meta) old slogan “move fast and break things”, which surprisingly succeeded in breaking quite a few things. That included the privacy of millions of people.

In addition to YouTube and Netflix videos, NVIDIA has reportedly instructed employees to train on movie trailer database MovieNet, internal libraries of video game images and Github datasets for WebVid video (now decommissioned and quit) and InternVid-10M. The latter is a dataset containing 10 million YouTube video IDs.

Some data that NVIDIA is allegedly training on is marked as suitable for educational (or non-commercial) use only. HD-VG-130M, a library of 130 million YouTube videos, includes a license to use that specifies that it is for academic research only. NVIDIA reportedly brushed aside concerns about academic terms only, insisting that their clusters were a fair match for its commercial AI products.

To avoid detection on YouTube, NVIDIA reportedly downloaded content using virtual machines (VMs) with changing IP addresses to avoid the ban. In response to an employee’s suggestion to use a third-party IP address rotation tool, one NVIDIA employee wrote, “Working. [Amazon Web Services](#) and restart ia [virtual machine](#) example assigns a new public IP[.](#) So, that is not a problem so far.”

404 MediaNVIDIA’s full report is a must read.

[ad_2]

Source link