Can copyright-protected material be used to train AI models? The European TDM exception in the spotlight
Introduction: The Copyright Wars
If you have glanced at tech headlines lately, you will know that the battle lines are drawn. On one side: rightsholders, publishers like The New York Times, and creative minds such as Sarah Silverman. On the other: AI companies, including OpenAI, eager to train their models with as much data as possible. The burning question? Can copyright-protected material be used to train an AI model? The answer is, perhaps not surprisingly: “It depends.” Let us dive into the European legal jungle, with a special focus on the text and data mining (TDM) exception in the EU’s Directive (EU) 2019/790 (the Copyright Directive), its Swedish implementation, and the legal uncertainties that follow.
The TDM exception: A legal loophole or a license to train?
Directive (EU) 2019/790 (the Copyright Directive) is implemented in the Swedish Copyright Act. By way of implementation of the Copyright Directive, the Copyright Act now contains an exception that allows works protected by copyright to be used for TDM purposes under certain conditions, as long as the creator has not reserved such use in an appropriate manner.
Does this mean that copyright-protected works can be used to train generative AI models? Well, this is a hotly debated topic. Some say the exception was not devised with artificial intelligence in mind, but rather for more traditional machine learning models. Moreover, the exception is thought to restrict copyrights in a disproportionate way.
Others argue that the law is flexible enough to cover generative AI models. The Swedish Copyright Act’s definition of TDM is broad: “an automated technique used to analyze text and data in digital form for the purpose of generating information.” Comparing the aforementioned text in the Swedish Copyright Act with Article 4 of the Copyright Directive, it is clear that the examples expressed in the Copyright Directive (patterns, trends, or correlations) are not present in the Copyright Act. From a systematic approach, this could mean that the Swedish implementation covers a wider array of actions. Does this mean that the training of generative AI models falls within the definition? As we can see below, there are several arguments to support such a conclusion. But first, how does the TDM exception work?
How does the exception work in practice?
Let us break down the four key conditions for using the TDM exception in Sweden:
- TDM purposes: Perhaps self-explanatory, but the purpose of the actions must be to use an automated technique to analyze text and data in digital form for the purpose of generating information.
- Lawful Access: You must have legally obtained the material used for the TDM purpose. This means no scraping behind paywalls or downloading from pirate sites. Open-access or purchased content only.
- Temporary Copies: Any copies of the works made during the process must be deleted once the mining is done. No hoarding for future use!
- Opt-Out Mechanism: The author can reserve their rights, for example, via machine-readable means or website terms. If they do, their works are out of bounds unless you get a license.
If you tick all these boxes, you can use copyright-protected works for TDM purposes according to the Swedish Copyright Act.
What counts as “data mining”?
Now, to the most challenging part: does the training of generative AI fit within the definition of TDM?
As you may be aware, the EU recently adopted the AI Act, which regulates the provision of generative AI models. This Act contains references to the Copyright Directive that can be used to support the conclusion that the training of generative AI is indeed permissible on the basis of the TDM exception.
Firstly, in Recital 105 of the AI Act, it is acknowledged that the training of generative AI models requires access to vast amounts of information, which may interfere with copyright. In the same recital, reference is made to the TDM exception, emphasizing that under certain conditions, authorization from rightsholders is not necessary, unless the rightsholders have opted out of the TDM exception.
Secondly, in Article 53 of the AI Act, which contains obligations for providers of general-purpose AI models, another explicit reference is made to the TDM exception. The article states that AI operators shall adopt a policy for complying with EU copyright regulations, including ensuring compliance with rightsholders’ opt-outs from the TDM exception. In other words, AI operators must ensure that opt-outs from rightsholders can be detected when scanning for useful material.
In addition to the hints in the AI Act, regardless of whether generative AI models were perceived by the legislator when the TDM exception was drafted in 2019, the European Court of Justice has expressed the view that originally narrow exceptions can be interpreted more broadly to adapt to new technologies.1
Seen from a rightsholder’s perspective, an argument can be made that the TDM exceptions strike a fair balance between the interests of AI operators and creators. The growing need for material to train AI models means that creators can leverage the opt-out mechanism to obtain lucrative license deals.
Considering the arguments above, as well as the broad definition of TDM, there is reason to include the training of generative AI models within the exception.
Conclusion: The debate continues, but the door is ajar
From the trenches of legal opinions and legislative texts, it seems possible to use copyright-protected material for AI training in Sweden, as long as all the conditions of the TDM exception are fulfilled. However, this question is not one to take lightly, and no guiding judgments on the subject have yet been delivered. Thus, this question is far from certain, and interpretations may shift. If you have any questions or would like to discuss this further, please don’t hesitate to reach out to someone on the team.
Gulliksson celebrates its 50th anniversary
Take part in glimpses and highlights from our journey from the start in 1975 to today!
[1] See the judgment in the joined cases C‑403/08 and C‑429/08, p. 164.