GEMA vs. OpenAI: The ruling that could change the way AI uses copyrighted works.
Gianpaolo Todisco - Partner
The relationship between artificial intelligence and copyright is now at the center of a global debate, but until now, there had been no clear ruling from European courts on a fundamental question: Can artificial intelligence platforms freely use copyrighted works to train their models?
With the recent decision by the Munich Regional Court in the case of GEMA v. OpenAI, this question has finally been answered. And it is by no means the answer that big tech had hoped for.
The lawsuit stems from allegations by GEMA, Germany’s leading collecting society, that OpenAI used thousands of copyrighted song lyrics to train its models without obtaining any authorization. The issue, therefore, does not concern the outputs generated by AI, but rather the most fundamental phase of the process: training. OpenAI, like many other companies in the sector, has always maintained that this phase is purely technical and therefore does not fall within the traditional categories of copyright law. This position has, until now, been met with a certain degree of regulatory tolerance, primarily because the phenomenon was new, elusive, and difficult to regulate.
The German court has decisively overturned this paradigm. The ruling states that training a language model constitutes, to all intents and purposes, an act of reproduction of the work and that, consequently, it falls within the scope of the author’s exclusive right. The fact that the operation is performed by an algorithm, in an automated and large-scale manner, does not alter the legal substance of the activity: if a text is copied, stored, analyzed, or reworked within a dataset, that use must be authorized.
That’s not all. The judge clarified that, in this case, the European exceptions regarding text and data mining cannot be invoked. The DSM Directive had opened up significant scope for scientific research and, in some cases, even for commercial uses, but on one essential condition: that the author had not expressed a desire to reserve their rights. Many of the texts represented by GEMA, however, already included a clear reservation clause. According to the court, OpenAI should therefore have noticed this and taken steps to obtain the necessary licenses.
This decision appears set to mark a watershed moment. For years, artificial intelligence models have been trained using enormous amounts of data sourced online, often without distinguishing between free and copyrighted material. This practice was justified by the technical complexity of the processes and by the fact that no legislation had ever directly addressed the issue. Today, however, that “gap” is beginning to close. And the Munich ruling leaves no room for overly broad interpretations: using copyrighted works requires a license, just as it does for any other type of reproduction.
The potential implications for the industry are enormous. Large companies may be required to negotiate licensing agreements on a massive scale with national and international collecting societies, paving the way for an entirely new market for “AI royalties.” At the same time, greater transparency regarding the datasets used for training will be inevitable—an issue that has remained deliberately opaque for years. It is possible that in the future, platforms will be required to indicate which works they have used, under what authorizations, and with what limitations.
This is clearly a significant victory for authors and publishers. For the first time, a court has recognized that the value of creative works does not end with their public use, but continues to exist even in a new context such as the training of artificial intelligence. This could open the door to new forms of compensation and greater control over the use of content in the digital world.
The GEMA v. OpenAI case also comes at a time when several European courts are addressing similar issues and the Court of Justice of the European Union has been asked to rule on related matters. The sense is that we are at the beginning of a completely new phase in copyright law, in which the very concept of “reproduction” will have to be reinterpreted in light of machine learning technologies.
One thing, however, is already clear: the days when AI systems could train undisturbed using whatever they found online are over. The Munich Court’s decision marks the entry of copyright into the control room of artificial intelligence. And from here on out, anyone who wants to develop increasingly powerful models will necessarily have to deal with authors, publishers, and collecting societies. It is no longer just a technical problem: it is a legal, economic, and cultural issue that defines how human creativity interacts with artificial creativity.