Text And Data Mining – Decoding Copyright Challenges In India

April 25, 2024 at 03:43 pm IST

Outline

Increasing number of copyright claims in the United States and other countries, challenging use of in-copyright works in machine learning and AI generated output.

Increasing trend of countries introducing exceptions in their copyright laws enabling machine learning / text and data mining.

Indian copyright law does not include specific safeguards for machine learning or text and data mining and the general exceptions under the law are limited.

Due diligence of terms of the platforms /databases where the content resides becomes important by participants in the value chain, whether you are a content creator or an AI product developer.

Introduction

With the increasing use of AI and generative AI, human ingenuity is being challenged by these rapidly advancing technologies. In December 2023, the New York Times¹ sued Open AI and Microsoft in the United States for the alleged infringing use of its copyright works.² The primary claims made by the NY Times are that millions of its articles were used to train chatbots who now compete with it. This legal battle is one amongst the many copyright claims against Open AI, including actions brought forth by numerous authors and artists.³ While the law on use of in-copyright works in training data continues to develop in the United States, countries like Japan, Singapore and the EU have included limited exceptions under their copyright laws to enable text and data mining.⁴ Closer to home, a pivotal question looms - how does the copyright law in India balance the interests of copyright works on the one hand and the enabling of machine learning and AI on the other? According to a recent press release, the GOI⁵> has expressed confidence in the adequacy of the copyright laws to address concerns surrounding AI generated works and related innovations. This write-up looks at this question under Indian law.

TDM

Training data or TDM is the foundational step for any AI model. It is the systematic collection of extensive digitized material, coupled with the utilization of software to analyze and extract valuable information from this corpus.⁶ This involves web scraping, web crawling, and web archiving amongst other things. The EU Directive on Copyright describes TDM as "New technologies that enable the automated computational analysis of information in digital form, such as text, sounds, images or data."⁷ According to the EU Directive, TDMs make possible the processing of large amounts of information to gain new knowledge and discover new trends. While TDM finds application in several non-AI⁸ contexts, this writeup focuses on TDM employed for training AI models.

Use Cases

A. Machine learning, deep learning, pattern recognition without reproducing in-copyright works in generative output

TDM involves collecting and cleaning data for analysis, pattern recognition, deep learning etc. This includes making a copy of the data to be studied and subsequently transferring it to a tool for examination.⁹ Making a copy or the reproduction of an in-copyright work is the exclusive right vesting with the copyright owner unless permitted by the copyright owner or an exception permitted under the Indian copyright law.

One of the claims made in the NYT Complaint is that the act of making an unauthorized copy of the in-copyrighted works for machine learning amounts to copyright infringement. ¹⁰ In previous non-AI related cases¹¹, US courts have supported the view that copying of in-copyright texts in TDM for research purposes is fair use. These claims are yet to be determined in the context of AI. Countries like Singapore¹² and Japan¹³ have reduced the uncertainty and introduced exceptions in their copyright laws, permitting copying of in-copyright works for machine learning, pattern recognition, data verification, subject to conditions.¹⁴ The EU Directive on Copyright issued to its member states directs the member states to allow reproduction (i) by research organizations and cultural heritage institutions of in-copyright works for TDM, for the purpose of scientific research; and (ii) for all other purposes on the condition that the right holder has not opted out of such use of their work.¹⁵

In India, there are no specific exceptions for copying or reproduction of in-copyright works for machine learning purposes. Therefore, the use cases need to fall within the ambit of existing exceptions under the copyright law, such as fair dealing.¹⁶ The scope of fair dealing in India is narrow and applies to the literary, dramatic, musical, or artistic works.¹⁷ Sound recordings and cinematograph films fall outside the scope of the fair dealing.¹⁸ Only use cases such as private or personal use, including research, criticism, or review that satisfy the test of fair dealing are not considered infringement.¹⁹ The courts have traditionally looked at the following three factors in deciding what is fair dealing of an in-copyright work: (i) the amount and substantiality of the portion used; (ii) the purpose and character of the use; and (iii) the effect on the potential market.²⁰ Courts have held that if the purpose of the use is commercial in nature then it is not considered private or personal use, thus falling outside the scope of fair dealing.²¹

Keeping in mind the narrow exceptions under Indian copyright law, it would be prudent to evaluate certain aspects of the TDM activity, for example (i) purpose or use of the TDM and would any of these purposes fall within the exceptions; and (ii) terms and conditions of the data bases/sets that are used for machine learning or TDM.

B. Machine learning, deep learning, pattern recognition with use of in-copyright works in generative output

In generative AI, the training data may also be reproduced while generating responses solutions or services. Such reproductions in output could trigger rights of copyright holders such as reproduction rights, communication rights and adaptation rights. In the NYT Complaint the NY Times claims that "the current GPT-4 LLM will output near-verbatim copies of significant portions of Times Works when prompted to do so. Such memorized examples constitute unauthorized copies or derivative works of the Times Works used to train the model."²² Other cases in the United States have made similar claims. In October 2023, Universal Music Group filed copyright infringement lawsuit against Anthropic AI alleging that the AI is "copying and distributing lyrics from over 500 songs by renowned artists such as Katy Perry, the Rolling Stones, and Beyoncé."²³

Under Indian law, the analysis would hinge on where does the generative output fall on the spectrum of copyright, full reproduction - adaptation/derivative - new original work. The commonly used test by Indian courts has been whether the work is substantially similar to the in-copyright learning data. If there is substantial similarity, it would be considered infringement unless it falls with the statutory exceptions, which as we observed in A, are limited. Courts have looked at (i) quality of the content copied as opposed to quantity²⁴; (ii) 'total concept and feel test', where the determination is based on whether a reader, spectator, or viewer, after experiencing both works, unmistakably perceives the subsequent work as a copy of the original;²⁵ and (iii) abstraction-filtration-comparison test, that involves analyzing works by abstracting their core ideas, filtering out unprotectable elements, and comparing the remaining protected elements to assess if infringement has occurred.²⁶

Our Lens

We are seeing an increasing number of countries amending their copyright laws to include TDM related exceptions, some wider than the others. These changes are being brought to participate and stay ahead in the build and adoption of AI models. In India, there has been a history of exceptions being carved out to balance the rights of the copyright holders and technological advancements.²⁷ The government's current stance, as articulated in the press release, indicates the absence of immediate plans to modify existing laws in the context of training data and AI. Would the existing exceptions support the increasing use of in-copyright works in training AI models for commercial use? Unlikely. Currently, individual participants in the value chain are left to determine how their works and databases are used and the commercials associated with such use.²⁸

Footnotes

^{1. Hereinafter "NY Times".}

^{2. The New York Times Vs. Microsoft Corporation, Open AI & Ors. (2023) (Hereinafter "NYT Complaint").}

^{3. Authors Guild Vs. OpenAI and Ors. (2023); Andersen vs. Stability AI (2023).}

^{4. (Hereinafter "TDM") 'Factsheet on Copyright Act 2021' (Intellectual Property Office of Singapore, 24 November 2022); 'EU Directive on Copyright' (European Parliament, 2019); Articles 30-4, 47-4, 47-5 of the Japanese Copyright Law, 1970. See also: 'Japan Amends its Copyright Legislation to Meet Future Demands in AI and Big Data' (European Alliance for Research Excellence, 3 September 2018).}

^{5. Government of India, See also: The Press Release.}

^{6. 'Text and Data Mining - What is TDM?' (University of Cambridge); 'Text Data Mining: A Proposed Framework and Future Perspectives' (International Journal of Business Information Systems, 2015).}

^{7. 'EU Directive on Copyright' (European Parliament, 2019).}

^{8. For example, TDM finds application in scientific research for efficient literature analysis and in business intelligence for market trend identification and legal compliance research. The use of AI is not necessary for such analysis. The term "TDM" was coined in 1999 by Marti A. Hearst.}

^{9. 'Text and Data Mining' (University of Bermingham).}

^{10. (n 2).}

^{11. Authors Guild, Inc. Vs. Google, Inc. (804 F.3d 202); Authors Guild, Inc. v. HathiTrust (902 F.Supp.2d 445).}

^{12. (n 4).}

^{13. (n 4).}

^{14. Section 244, Singapore Copyright Act 2021.}

^{15. Articles 3 and 4, 'EU Directive on Copyright'(European Parliament, 2019).}

^{16. Section 52, Indian Copyright Act.}

^{17. Super Cassettes Industries Limited and Ors. Vs. Chintamani Rao and Ors. 2011 SCC OnLine Del 4712.}

^{18. Ibid.}

^{19. Section 52, Indian Copyright Act.}

^{20. Civic Chandran Vs. C. Ammini Amma, 16 PTC 329 Madras; Blackwood and Sons vs. A.N. Parsuraman, AIR 1959 Madras 410.}

^{21. Super Cassettes Industries Ltd. Vs. Hamar Television Network Pvt. Ltd. and Ors. 2011(45) PTC 70(Del); Tips Industries Ltd. Vs. Wynk Music Ltd. and Ors. 2019 SCC OnLine Bom 13087.}

^{22. (n 2).}

^{23. 'Universal Music files $75 million lawsuit against AI firm Anthropic for copying Rolling Stones, Beyonce lyrics' (The Economic Times, 20 October 2023).}

^{24. R.G. Anand Vs. M/S Deluxe Films and Ors., AIR 1978 SC 1613.}

^{25. Ibid.}

^{26. Shamoil Ahmad Khan Vs. Falguni Shah & Ors. 2020 SCC OnLine Bom 665; Also see: The "Abstraction, Filtration, Comparison" Test (Ladas & Perry LLP).}

^{27. An example of the same is the software related amendments made to the Copyright Act in 1994.}

^{28. YouTube, TikTok, and Instagram.}

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Anju Jain Kumar
Veritas Legal
Forbes Building, 1st Floor
Charanjit Rai Marg, Fort
Mumbai
400 001
INDIA

	Add to a list Add to a list 0 selected To use this feature you must be a member Log in Sign up	Price	Change	5d. change	Capi.
	MICROSOFT CORPORATION	412 USD	+0.36%	+3.61%	3,051B
	UNIVERSAL MUSIC GROUP N.V.	29 EUR	-1.43%	+3.91%	57.59B

Add to a list

Price

Change

5d. change

Capi.

MICROSOFT CORPORATION

412 USD

+0.36%

+3.61%

3,051B

UNIVERSAL MUSIC GROUP N.V.

29 EUR

-1.43%

+3.91%

57.59B

	1st Jan change	Capi.
UNIVERSAL MUSIC GROUP N.V.	+12.36%	57.59B
HYBE CO., LTD.	-13.06%	6.2B
CLOUD MUSIC INC.	+5.01%	2.54B
JYP ENTERTAINMENT CORPORATION	-31.29%	1.69B
SM ENTERTAINMENT CO., LTD.	-8.47%	1.43B
AVEX INC.	-4.61%	369M
LIVEONE, INC.	+34.53%	173M
HIM INTERNATIONAL MUSIC INC.	-0.94%	172M
GENIE MUSIC CORPORATION	-8.02%	134M

1st Jan change

Capi.

UNIVERSAL MUSIC GROUP N.V.

+12.36%

57.59B

HYBE CO., LTD.

-13.06%

6.2B

CLOUD MUSIC INC.

+5.01%

2.54B

JYP ENTERTAINMENT CORPORATION

-31.29%

1.69B

SM ENTERTAINMENT CO., LTD.

-8.47%

1.43B

AVEX INC.

-4.61%

369M

LIVEONE, INC.

+34.53%

173M

HIM INTERNATIONAL MUSIC INC.

-0.94%

172M

GENIE MUSIC CORPORATION

-8.02%

134M

Real-time Euronext Amsterdam Other stock markets 08:42:46 09/05/2024 pm IST			5-day change	1st Jan Change
29 ^EUR	-1.43%		+3.91%	+12.36%

03/05	UMG: share price up, analyst raises target	CF
03/05	UMG : UBS raises its target price on the stock	CF

UMG: share price up, analyst raises target	03/05	CF
UMG : UBS raises its target price on the stock	03/05	CF
Tech Advances as Risk Appetite Bounces Back - Tech Roundup	03/05	DJ
News Highlights : Top Company News of the Day - Thursday at 3 PM ET	03/05	DJ
News Highlights : Top Company News of the Day - Thursday at 1 PM ET	02/05	DJ
Universal Music Revenue Beats Forecasts Amid Strong Subscriptions and Streaming Growth -- Update	02/05	DJ
Universal Music beats earnings forecasts after blockbuster Swift tour	02/05	RE
Transcript : Universal Music Group N.V., Q1 2024 Earnings Call, May 02, 2024	02/05
Universal Music Revenue Beats Forecasts Amid Strong Subscriptions and Streaming Growth	02/05	DJ
Universal Music Group first-quarter core earnings beat forecast	02/05	RE
Global markets live: Etsy, Qualcomm, Carvana, Rio Tinto, Apple...	02/05
News Highlights : Top Company News of the Day - Thursday at 11 AM ET	02/05	DJ
News Highlights : Top Company News of the Day - Thursday at 9 AM ET	02/05	DJ
Universal Music Group Signs New Licensing Deal With TikTok	02/05	MT
News Highlights : Top Company News of the Day - Thursday at 7 AM ET	02/05	DJ
North American Morning Briefing : Stock Futures Rise as More Earnings Roll In	02/05	DJ
TikTok to Welcome Back Universal Music Artists After New Licensing Agreement -- 2nd Update	02/05	DJ
European Midday Briefing : Stocks Steady as Powell Suggests Rate Hike Unlikely	02/05	DJ
Universal Music Artists to Return to TikTok After New Licensing Agreement -- Update	02/05	DJ
Universal Music Artists to Return to TikTok After New Licensing Deal	02/05	DJ
Universal Music Group reaches new licensing agreement with TikTok	02/05	RE
Taylor Swift's 'Tortured Poets Department' dominates US sales and Billboard charts	29/04	RE
SAG-AFTRA union secures AI protections for artists in deal with major record labels	13/04	RE
Taylor Swift music back on TikTok despite fight with Universal Music, FT reports	12/04	RE
Global markets live: Airbus, UBS, KKR, Meta Platforms, Tesla...	11/04

Universal Music Group N.V.

Equities

UMG

NL0015000IY2

Entertainment Production

Text And Data Mining – Decoding Copyright Challenges In India

Stocks mentioned in the article

Latest news about Universal Music Group N.V.

Chart Universal Music Group N.V.

Company Profile

Income Statement Evolution

Ratings for Universal Music Group N.V.

Analysts' Consensus

EPS Revisions

Quarterly earnings - Rate of surprise

Sector Music, Music Video Production & Distribution