AI models trained on copyrighted material create legal and regulatory risk for deploying organisations. The EU AI Act requires the AISDP to document the copyright status of training data, the legal basis for its use, and procedures for handling rights holder claims.
Copyright and intellectual property risk is a critical consideration when selecting AI models for deployment under the EU AI Act. The training data used to develop large language models and generative AI systems frequently includes copyrighted text, images, audio, and other works, and the legality of this practice is being challenged in courts across multiple jurisdictions. For high-risk AI systems, the AI System Description and Performance document must record the copyright status of the training data, the legal basis relied upon for processing it, and the measures taken to respect rights holder opt-outs under Directive (EU) 2019/790. Where organisations deploy third-party pre-trained models, contractual representations about training data copyright status should be obtained from providers. Unqualified or unavailable representations must be recorded in the risk register and assessed for regulatory and reputational impact.
The training data used to develop AI models, particularly large language models and generative AI systems, may include copyrighted material.
The training data used to develop AI models, particularly large language models and generative AI systems, may include copyrighted material. The legal landscape is evolving rapidly, with litigation in multiple jurisdictions challenging the legality of training on copyrighted content without licence. For high-risk AI systems under the EU AI Act, the must document the copyright status of the training data used in models the organisation deploys.
The AISDP must record the copyright status of the training data for each model deployed.
The AISDP must record the copyright status of the training data for each model deployed. This includes identifying whether the training data includes copyrighted text, images, audio, or other works, and the legal basis relied upon for processing that material. Acceptable legal bases include licence, consent, the text and data mining exception under Directive (EU) 2019/790, or another recognised basis. The documentation must also cover the measures taken to identify and exclude material where the rights holder has exercised an opt-out under the Directive.
The AISDP must document the procedures for responding to copyright claims from rights holders.
The AISDP must document the procedures for responding to copyright claims from rights holders. This means establishing a clear process for receiving, assessing, and acting upon claims that copyrighted material has been used in the training data of a deployed model. Organisations should ensure that these procedures are proportionate to the scale and nature of the AI system's use of third-party content, and that they can demonstrate compliance with applicable copyright law when challenged.
For systems incorporating pre-trained models from third parties, the organisation should obtain contractual representations regarding the copyright status of the model's training data.
For systems incorporating pre-trained models from third parties, the organisation should obtain contractual representations regarding the copyright status of the model's training data. These representations should cover the legal basis on which the training data was collected and processed, whether any rights holder opt-outs have been respected, and the provider's procedures for handling copyright claims. Contractual protections provide a documented chain of accountability that the ai system assessor can reference in the AISDP.
Where contractual representations from the model provider are unavailable or qualified, the AI System Assessor records the risk in the RISK REGISTER and assesses it for potential regulatory and reputational impact.
Where contractual representations from the model provider are unavailable or qualified, the AI System Assessor records the risk in the risk register and assesses it for potential regulatory and reputational impact. A qualified representation — for example, one that disclaims liability for a subset of training data — indicates that the organisation cannot fully rely on the provider's assurances. The risk assessment should consider the severity of potential infringement, the likelihood of claims being brought, and the reputational consequences for the deploying organisation.
Organisations can reduce copyright risk through several practical measures.
Organisations can reduce copyright risk through several practical measures. Input filtering and output monitoring can detect and flag content that closely resembles known copyrighted works. Regular audits of the model provider's copyright compliance documentation help ensure that representations remain current. Where the model provider updates the training data or model version, the organisation should reassess the copyright risk and update the AISDP accordingly. These controls are particularly important where the legal basis for training data processing is uncertain or contested.
Acceptable legal bases include licence, consent, the text and data mining exception under Directive (EU) 2019/790, or another recognised basis. The AISDP must document which basis applies to the training data used.
The AI System Assessor records the risk in the risk register and assesses it for potential regulatory and reputational impact. A qualified or unavailable representation indicates the organisation cannot fully rely on the provider's assurances.
Directive (EU) 2019/790 provides an exception for text and data mining, but rights holders can exercise an opt-out. The AISDP must document the measures taken to identify and exclude material where the opt-out has been exercised.
AI training data frequently includes copyrighted material, and litigation is challenging the legality of training without licence across multiple jurisdictions.
The AISDP must record copyright status, legal basis for processing, opt-out measures, and procedures for responding to rights holder claims.
Organisations should obtain contractual representations covering copyright status, legal basis, opt-out compliance, and claims handling from model providers.
The AI System Assessor records the risk in the risk register and assesses it for regulatory and reputational impact.
CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.