2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

BUSINESS INSIDER

Two award-winning authors recently sued OpenAI, accusing the generative-AI bastion of violating copyright law by using their published books to train ChatGPT without their consent.

OpenAI is again facing scrutiny over its data-collection practices. This time, it’s from authors.
Two writers are suing OpenAI, accusing the company of ingesting their books to train ChatGPT.
A law professor anticipates more lawsuits involving copyright law and generative AI in the future.

Filed in late June, the lawsuit claims that ChatGPT’s underlying large language model “ingested” the copyrighted work of the case’s plaintiffs, authors Mona Awad and Paul Tremblay. They argue that ChatGPT’s ability to produce detailed summaries of their works indicates their books were included in datasets used to train the technology.

The suit is the latest example of tension between creatives and generative AI tools capable of producing text and images in seconds. Many workers in creative fields are concerned with how the fast-developing technology could impact their careers and livelihoods. And these concerns may increasingly manifest in legal challenges.

Daniel Gervais, a law professor at Vanderbilt University, told Insider that the writers’ lawsuit is one of a handful of copyright cases against generative AI tools nationwide. It won’t be the last, he added.

Gervais expects many more authors will sue companies developing large language models and generative AI as these programs advance and improve at replicating the style of writers and artists. He believes a deluge of legal challenges targeting the output of tools like ChatGPT nationwide is imminent.

“This one is really about the input,” Gervais said, speaking on the lawsuit’s allegations around AI data-scraping and training. “The output wave is coming as well.”

Proving the authors in the case incurred monetary damages due to OpenAI’s data-collection practices, like the complaint alleges, may be challenging. Gervais told Insider that ChatGPT may have gleaned Awad and Tremblay’s work from alternative sources other than the source material from the authors, but that it was possible the bot “ingested” their books like the lawsuit claims.

Andres Guadamuz, an expert in AI and copyright at the University of Sussex, echoed this concern, telling Insider that even if the books are in OpenAI’s training datasets, the company could have obtained the work through the lawful collection of another dataset.

And showing that ChatGPT would have behaved differently if it never scooped up the work of the authors is unlikely due to the vast amount of data it scrapes off the web, Guadamuz told The Guardian.

The Authors Guild, a US-based advocacy group that supports the working rights of writers, published an open letter last week calling on the chief executives of Big Tech and AI companies to “obtain permission” from writers to use their copyrighted work in training generative AI programs and “compensate writers fairly.” The organization told Insider that its letter has garnered over 2,000 signatures.

2 authors say OpenAI 'ingested' their books to train ChatGPT. Now they're suing, and a 'wave' of similar court cases may follow.

Connect with us on our socials: