LLMs memorize whole copyrighted books.
A recent study showed that it is possible to extract large portions of popular books from all the major chatbots.
The chatbots were given only the first pages of the book as input, and then the researchers were able to systematically extract large parts of the book.
“Claude 3.7 Sonnet outputs entire books near-verbatim.” It was able to output 95.8% of Harry Potter and the Philosopher’s Stone, after receiving the first pages as input.
“We find that is possible to extract large portions of memorized copyrighted material from all four production LLMs”.
They have shown this to work for multiple popular books, and this is further proof that AI models are trained on copyrighted material and memorize it.

Leave a Reply