LLMs memorize whole books

LLMs memorize whole copyrighted books.
A recent study showed that it is possible to extract large portions of popular books from all the major chatbots.

The chatbots were given only the first pages of the book as input, and then the researchers were able to systematically extract large parts of the book.

“Claude 3.7 Sonnet outputs entire books near-verbatim.” It was able to output 95.8% of Harry Potter and the Philosopher’s Stone, after receiving the first pages as input.

“We find that is possible to extract large portions of memorized copyrighted material from all four production LLMs”.

They have shown this to work for multiple popular books, and this is further proof that AI models are trained on copyrighted material and memorize it.

Source: https://arxiv.org/pdf/2601.02671v1


Comments

Leave a Reply

Discover more from Together Against AI

Subscribe now to keep reading and get access to the full archive.

Continue reading