Over 49,000 illegally acquired Spanish literature books employed in artificial intelligence applications

In the face of growing concerns about the use of illegally pirated books and other copyrighted works for AI training, governments worldwide are actively grappling with how to protect the rights of publishers and creatives. This issue has been highlighted by Spain's publishing industry, where the Federation of Spanish Editorial Guilds has expressed fears that AI-generated pieces could be mistaken for human-made without proper regulations.

A report published by the Danish Rights Alliance in September 2024 revealed the alleged use of pirated books for AI training, with nearly 50,000 books and works belonging to at least 41,000 authors and 1,100 different publishers being used illegally. The Russian pirating website Libgen, which has been ordered to shut down and had its domain suspended in much of Europe, is said to have trained AI models such as OpenAI (also known as ChatGPT) and Meta with pirated Spanish works.

In the United States, the debate over AI copyright infringements is intense and evolving. The U.S. Copyright Office has taken a firm stance against considering the use of copyrighted materials for AI training as "fair use," signaling that such use could violate copyright law. This position has sparked controversy and criticism from proponents of AI innovation who argue that strict copyright constraints could hamper progress and competitiveness.

To manage such concerns, some U.S. states have begun enacting laws with clear rules about AI-generated content ownership and the lawful acquisition of training data. For example, Arkansas passed legislation clarifying that ownership of AI-generated content and trained models depends on lawful data use and specifying employer rights for AI used in work settings. Meanwhile, New York’s pending “Responsible AI Safety and Education Act” aims to impose transparency and safety requirements on large-scale AI models and developers.

Internationally, while specific details about Spanish government initiatives were not found, Spain's publishing industry has publicly raised concerns about AI systems being trained on illegally pirated books. This highlights a growing global trend where creative industries are pushing for stronger enforcement against copyright infringements in AI training datasets.

The EU has made steps towards legislature regarding AI, passing the first regulatory law concerning AI in May of the previous year. The EU is taking steps towards legislature in response to the globalisation of AI, with measures ranging from stricter enforcement and clearer legal frameworks around training data to proposed moratoria on fragmented state AI regulations that might complicate the market.

CEDRO, the Spanish Intellectual Property Rights Management Entity, has urged the government to provide authors and publishers with tools and resources to manage their own intellectual property to combat the growing "copying culture." Jorge Corrales, Director of CEDRO, criticized the current government regulations concerning AI as "totally defective" and expressed a lack of effort in keeping up with technology.

The government is expected to play a crucial role in finding a long-term solution that benefits publishers and creatives in managing their copyright licensing, while adapting to the international usage of AI. Affected Spanish publishers include Grupo Planeta, Acantilado, Anagrama, Libros del Asteroide, and the RAE, with notable Spanish authors such as Almudena Grandes, Arturo Pérez Reverte, Fernando Aramburu, Dolores Redondo, Lorenzo Silva, María Dueñas, and Eduardo Mendoza also affected.

As the use of AI continues to grow, it is clear that governments must find a way to balance the need for innovation with the protection of intellectual property rights. The ongoing tension suggests that policies will continue to evolve to address illicit use of copyrighted materials, such as pirated books in AI training, with a combination of legislative action, regulatory oversight, and possibly litigation shaping the landscape in countries like Spain and beyond.

The ongoing debate in the United States about AI copyright infringements shows that governments across the globe, like Spain, are increasingly focusing on the role of technology in the finance sector, particularly in the use of artificial-intelligence (AI) for training, as the misuse of pirated books poses a threat to the industry and creatives' rights.
In an effort to protect publishers and creatives from the unauthorized use of their works in AI training, various governments, including those in the United States and the European Union, are actively exploring legislation and policy changes in the technology industry, aiming to strike a balance between fostering innovation and upholding intellectual property rights.

Over 49,000 illegally acquired Spanish literature books employed in artificial intelligence applications