Outline
- Increasing number of copyright claims in
- Increasing trend of countries introducing exceptions in their copyright laws enabling machine learning / text and data mining.
- Indian copyright law does not include specific safeguards for machine learning or text and data mining and the general exceptions under the law are limited.
- Due diligence of terms of the platforms /databases where the content resides becomes important by participants in the value chain, whether you are a content creator or an AI product developer.
Introduction
With the increasing use of AI and generative AI, human ingenuity is being challenged by these rapidly advancing technologies. In
TDM
Training data or TDM is the foundational step for any AI model. It is the systematic collection of extensive digitized material, coupled with the utilization of software to analyze and extract valuable information from this corpus.6 This involves web scraping, web crawling, and web archiving amongst other things. The EU Directive on Copyright describes TDM as "New technologies that enable the automated computational analysis of information in digital form, such as text, sounds, images or data."7 According to the EU Directive, TDMs make possible the processing of large amounts of information to gain new knowledge and discover new trends. While TDM finds application in several non-AI8 contexts, this writeup focuses on TDM employed for training AI models.
Use Cases
A. Machine learning, deep learning, pattern recognition without reproducing in-copyright works in generative output
TDM involves collecting and cleaning data for analysis, pattern recognition, deep learning etc. This includes making a copy of the data to be studied and subsequently transferring it to a tool for examination.9 Making a copy or the reproduction of an in-copyright work is the exclusive right vesting with the copyright owner unless permitted by the copyright owner or an exception permitted under the Indian copyright law.
One of the claims made in the NYT Complaint is that the act of making an unauthorized copy of the in-copyrighted works for machine learning amounts to copyright infringement. 10 In previous non-AI related cases11, US courts have supported the view that copying of in-copyright texts in TDM for research purposes is fair use. These claims are yet to be determined in the context of AI. Countries like
In
Keeping in mind the narrow exceptions under Indian copyright law, it would be prudent to evaluate certain aspects of the TDM activity, for example (i) purpose or use of the TDM and would any of these purposes fall within the exceptions; and (ii) terms and conditions of the data bases/sets that are used for machine learning or TDM.
B. Machine learning, deep learning, pattern recognition with use of in-copyright works in generative output
In generative AI, the training data may also be reproduced while generating responses solutions or services. Such reproductions in output could trigger rights of copyright holders such as reproduction rights, communication rights and adaptation rights. In the NYT Complaint the NY Times claims that "the current GPT-4 LLM will output near-verbatim copies of significant portions of Times Works when prompted to do so. Such memorized examples constitute unauthorized copies or derivative works of the Times Works used to train the model."22 Other cases in
Under Indian law, the analysis would hinge on where does the generative output fall on the spectrum of copyright, full reproduction - adaptation/derivative - new original work. The commonly used test by Indian courts has been whether the work is substantially similar to the in-copyright learning data. If there is substantial similarity, it would be considered infringement unless it falls with the statutory exceptions, which as we observed in A, are limited. Courts have looked at (i) quality of the content copied as opposed to quantity24; (ii) 'total concept and feel test', where the determination is based on whether a reader, spectator, or viewer, after experiencing both works, unmistakably perceives the subsequent work as a copy of the original;25 and (iii) abstraction-filtration-comparison test, that involves analyzing works by abstracting their core ideas, filtering out unprotectable elements, and comparing the remaining protected elements to assess if infringement has occurred.26
Our Lens
We are seeing an increasing number of countries amending their copyright laws to include TDM related exceptions, some wider than the others. These changes are being brought to participate and stay ahead in the build and adoption of AI models. In
Footnotes
1. Hereinafter "NY Times".
2. The New York Times Vs.
3. Authors Guild Vs. OpenAI and Ors. (2023); Andersen vs. Stability AI (2023).
4. (Hereinafter "TDM") 'Factsheet on Copyright Act 2021' (
5.
6. 'Text and Data Mining - What is TDM?' (
7. 'EU Directive on Copyright' (
8. For example, TDM finds application in scientific research for efficient literature analysis and in business intelligence for market trend identification and legal compliance research. The use of AI is not necessary for such analysis. The term "TDM" was coined in 1999 by
9. 'Text and Data Mining' (
10. (n 2).
11.
12. (n 4).
13. (n 4).
14. Section 244, Singapore Copyright Act 2021.
15. Articles 3 and 4, 'EU Directive on Copyright'(
16. Section 52, Indian Copyright Act.
17.
18. Ibid.
19. Section 52, Indian Copyright Act.
20. Civic Chandran Vs.
21.
22. (n 2).
23. '
24. R.G. Anand Vs.
25. Ibid.
26. Shamoil Ahmad Khan Vs.
27. An example of the same is the software related amendments made to the Copyright Act in 1994.
28. YouTube, TikTok, and Instagram.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.
Veritas Legal
400 001
© Mondaq Ltd, 2024 - Tel. +44 (0)20 8544 8300 - http://www.mondaq.com, source