Federal Judge Rules on OpenAI’s Use of Authors’ Works in Training AI Model
In a significant development, two copyright infringement lawsuits filed against artificial intelligence company OpenAI by comedian Sarah Silverman and novelist Paul Tremblay have been partially dismissed by a federal judge in California. The cases alleged that OpenAI unlawfully utilized their books to train ChatGPT, the large language model powering its artificial intelligence tool that generates human-like text in response to prompts.
Key Rulings and Dismissals
District court judge Araceli Martínez-Olguín granted the majority of OpenAI’s motion to dismiss several claims brought by the authors. Notably, the claim of vicarious copyright infringement lawsuits was dismissed because the authors failed to demonstrate “substantial similarity” between their books and the output generated by ChatGPT. Additionally, the assertion that all ChatGPT outputs are “infringing derivative work” was deemed insufficient.
Other dismissed claims included allegations of negligence, unjust enrichment, and violations of the Digital Millennium Copyright Act. However, OpenAI still faces an allegation that it violated unfair competition law by using copyrighted books without obtaining author permission.
Consolidation and Ongoing Allegations
The judge further ruled to consolidate these cases with a similar lawsuit brought by another group of authors, including Michael Chabon, Ta-Nehisi Coates, and Andrew Sean Greer.
While the vicarious infringement claim was dismissed, OpenAI continues to confront allegations of violating unfair competition law by incorporating copyrighted material without proper authorization. This ruling mirrors a case brought by Silverman against Meta, where the judge broadly sided with Meta but allowed the claim of direct copyright infringement lawsuits to proceed to the discovery phase.
Forum Shopping and Future Developments
Last week, the group of authors involved in the California lawsuit sought to halt a similar suit in New York, accusing OpenAI of “forum shopping for the most favorable schedule.” This legal maneuvering underscores the complex nature of the litigation surrounding AI training using copyrighted material.
In August, it was revealed that over 170,000 books, including works by Zadie Smith, Stephen King, Rachel Cusk, and Elena Ferrante, were used to train Meta’s LLaMA and possibly other generative-AI tools. OpenAI’s practices have come under scrutiny, with allegations that “shadow libraries” like Library Genesis (LibGen) were sources for the vast material used in training ChatGPT-3.
Judge Martínez-Olguín has given the authors until March 13 to amend their complaint, signaling the ongoing legal battle between content creators and AI companies over the use of copyrighted works in training artificial intelligence models.