Investigations reveal internal discussions within Meta regarding the use of copyrighted works in training their AI models. Court documents show employees acknowledging the legal implications of using such content but suggesting acquiring it through dubious means, such as e-book retail purchases or potentially using the controversial website Libgen, despite lawsuits and copyright infringement allegations.
Meta executives appear to have been aware of the potential competitive disadvantage if they did not utilize Libgen. They proposed “mitigations” to reduce legal exposure, including filtering pirated content and concealing the use of Libgen datasets.
The company also explored scraping Reddit data, despite the platform’s announcement of charging AI companies for access to its data. Meta’s leadership considered overriding previous decisions on training sets, including using Quora content or licensed materials, to ensure sufficient training data.
The plaintiffs in the Kadrey v. Meta case allege that Meta cross-referenced pirated books with licensed ones to determine whether to pursue licensing agreements. Meta’s defense team has recently added Supreme Court litigators, indicating the high stakes involved in this legal battle.
Original source: Read the full article on TechCrunch