NVIDIA, the renowned tech powerhouse, is currently entangled in a legal dispute initiated by a collective of authors. These authors allege the unauthorised use of their copyrighted materials in training its artificial intelligence platform, NeMo.
Authors Brian Keene, Abdi Nazemian, and Stewart O’Nan have asserted that their literary works were incorporated into a dataset comprising 196,640 books, which was used for NeMo’s training to replicate natural language. The process was terminated in October after reported copyright violations.
Authors Allege Copyright Infringement by NVIDIA in NeMo Training
In a recent development, authors Brian Keene, Abdi Nazemian, and Stewart O’Nan have initiated a proposed class action lawsuit against NVIDIA, filed Friday night in San Francisco federal court. The authors assert that NVIDIA “Admitted” to utilising their copyrighted works in training its NeMo artificial intelligence platform, thereby violating their copyrights. This legal action echoes similar lawsuits addressing AI-related copyright infringement.
The lawsuit aims to secure unspecified damages for individuals in the U.S. whose copyrighted materials contributed to training NeMo’s Large Language Models (LLMs) within the past three years. NVIDIA lauds LLMs as an efficient and cost-effective means of implementing generative AI.
Among the copyrighted works in question are Keene’s 2008 novel “Ghost Walk”, Nazemian’s 2019 novel “Like a Love Story”, and O’Nan’s 2007 novella “Last Night at the Lobster”. The suit alleges that these books were part of a dataset called “The Pile”, which encompassed a compilation of books referred to as “Books3”. NVIDIA reportedly confirmed training its NeMo Megatron AI models on The Pile and Books3.
Quick Link: NVIDIA’s Financial Triumph Amidst the AI Boom and Collaboration with Google
The NeMo Megatron models were previously accessible via a platform called Hugging Face, which provided a description of the AI models’ training dataset, explicitly stating their reliance on The Pile. However, as of October 2023, the Books3 dataset was removed from Hugging Face, accompanied by a notice citing “Reported copyright infringement” as the reason for its unavailability. NVIDIA chose not to comment on the ongoing legal proceedings.
NVIDIA‘s Phenomenal Rise in the Chip Market
As a primary chip manufacturer for highly sought-after AI chips, NVIDIA has witnessed an astounding surge in its stock price, catapulting by nearly 600% since the close of 2022 and securing a remarkable market valuation nearing $2.2 trillion. However, as of 3:15 p.m. ET Monday, NVIDIA shares experienced a slight dip of 1.7%, settling at $860.17.
However, despite a significant intraday reversal of over 10% and concluding Friday’s session 5.6% lower, the stock has achieved a noteworthy 74% gain since the onset of 2024. This fluctuation occurred as investors hurried to seize profits in response to apprehensions voiced by prominent analysts regarding the stock’s lofty trading levels.
While NVIDIA is renowned for its GeForce GTX and RTX line of desktop and laptop graphics cards, its dominance in Artificial Intelligence (AI) across software and hardware markets has solidified over recent years. This AI prowess was instrumental in propelling the company to attain a $2 trillion market cap late last month.
Navigating Copyright Challenges in AI Development
In the wake of this lawsuit, NVIDIA joins a growing roster of tech companies that are facing legal scrutiny regarding their utilisation of datasets for training AI technologies. The demand for extensive data to train generative AI models like OpenAI’s ChatGPT-3 or Google Bard has thrust the issue into the spotlight of copyright law, prompting numerous legal actions, particularly from creative professionals such as artists and authors, who contest the use of their work without consent.
A noteworthy example among these litigations is OpenAI’s confrontation with 18 authors, including the renowned George R. R. Martin, over alleged copyright infringement. Meta and Microsoft are also entangled in legal battles. Meta is accused of employing copyrighted material to train its LLMs. At the same time, Microsoft and OpenAI face a lawsuit from The New York Times for purported “Unauthorised use of published work” in their AI training.
Related: OpenAI, Microsoft Hit with New Author Copyright Lawsuit Over AI Training
Our Final Say
The surge in lawsuits over the past year highlights the potential of artificial intelligence to reshape United States copyright law. While tech companies argue that AI training mirrors human learning processes and is protected under “Fair use”, plaintiffs counter that the absence of consent for using their creations in AI training constitutes a misappropriation of their intellectual property.
NVIDIA’s legal challenges are a small part of the broader debate surrounding AI and copyright law. It highlights the need for clarity and consensus in navigating the intersection of technology and intellectual property rights.