Meta, the company that owns Facebook, is facing criticism for its new AI model, Llama 3.1, which seems to be copying large parts of copyrighted books, such as Harry Potter, The Hobbit, and 1984.
What Happened?
A group of researchers from Stanford, Cornell, and West Virginia University studied how five popular AI models learned from a large book dataset called Books3, which includes many copyrighted books. They found that Llama 3.1, released in July 2024, copied much more text from Harry Potter than other AI models.
- Llama 3.1 memorized about 42% of the first Harry Potter book.
- It could repeat 50-word sections correctly around 50% of the time.
- In comparison, an earlier version called Llama 1, released in 2023, had memorized only 4.4% of the same book.
This shows that Meta’s newer models are remembering and reproducing more copyrighted material than before.
Read more: Amazon CEO Tells Employees to Prepare for AI Changes and Possible Job Cuts
Why Is This Happening?
Experts say this may be because:
- The same books might have been used too often during training.
- The training data may have included fan sites, reviews, or academic content that quoted the books.
- Changes in the AI’s training process may have made it easier for the model to memorize exact text.
Why It’s a Big Deal
This raises serious concerns about how AI is trained and whether companies like Meta are breaking copyright laws. If these AI models are copying content without permission, it could lead to legal trouble.
Read more: Krea AI Unveils Krea 1 with Free Beta Access, Promises Sharp, Stylish AI Images
Earlier this year, The New York Times sued OpenAI and Microsoft, saying their AI tools (like ChatGPT) were trained on copyrighted articles and could repeat them word-for-word or in a very similar style.
Now, Meta could face similar problems, especially if their AI is using and reproducing books that are still protected by copyright.
