Use of copyrighted material to train generative AI – is it fair dealing?

I have an interest in how copyright applies to fan works (such as fanfiction and fan art) and was doing some research on the topic when I ran across the following article:

In essence, George R.R. Martin, the author of the acclaimed A Song of Ice and Fire book series (which was adapted into the hit HBO series Game of Thrones), is partnering with numerous other authors in launching a class-action lawsuit against OpenAI for copyright infringement. The authors allege OpenAI illegally copied the copyrighted works of the authors as training material for ChatGPT, OpenAI’s AI-powered language model. As a result, the authors are seeking an injunction to stop OpenAI from continuing to use the authors’ works along with monetary damages.

I thought this case was interesting because the conversation surrounding copyright and AI is usually focussed around ownership of the works produced by generative AI. At least personally, I have not put much thought into the massive amounts of information used to train the AI models and the associated copyright implications.

After some more reading, I found another article regarding the use of copyrighted material for AI training:

The article revolves around an open letter from several media organizations urging lawmakers to legislate transparency into training datasets and require consent of copyright holders before using the data for AI training.

This brings up some interesting questions. The development and research of generative AI require large datasets, including presumably massive amounts of material subject to copyright. Imposition of licensing requirements or consent for the use of all copyrighted material may exponentially increase the costs and slow down the progress of improving current AI models. One of the commonly cited arguments for copyright protection is it encourages innovation by guaranteeing economic reward for creators. However, this may be one of the instances where copyright protection stifles innovation.

It is also interesting to consider whether the use of copyrighted material to train AI may fall under fair dealing if Canadian copyright law applies. After all, AI development likely falls into research and/or education purposes. In addition, the companies are using the materials purely for research and development rather than for distribution. Although the amount of material used is large it is arguably necessary to develop a useful AI model. Finally, it is highly questionable whether the use of the copyrighted works will have any financial impact on the copyright holders. After all, people are still waiting on Winds of Winter after many years despite the abundance of fanfiction attempting to finish the book series (in particular after the TV show’s conclusion which left many people unsatisfied).

3 responses to “Use of copyrighted material to train generative AI – is it fair dealing?”

  1. nehagupt

    Hi Amy,

    Great post! I totally agree with you – I also find that the conversation surrounding AI tends to focus on who owns the copyright on work generated by AI, while ignoring the copyright of the work used to develop an AI’s database.

    After reading your post, I decided to do some research and found that the Writers Guild in the United States brought up similar concerns. In their negotiations, the Guild included the anti-AI stipulations to one, prevent studios from using AI to produce literary material and two, that AI not be trained to generate material created by Guild members. It seems as if by allowing AI to essentially produce work based off of existing work, authors/writers are not only losing their ability to control how their work is being used but they are also losing their ability to make a living off of their work.

    I find this quite interesting because at its very core, AI is unable to create anything unique or intrinsically valuable because everything that is generated by AI is a derivative of existing work. So, AI companies, like ChatGPT, are essentially being allowed to generate work copying existing material, and then people and studios are being able to make a living off of this copyrighted work without giving the actual creators their due diligence. Thus, if anything, AI acts as the ultimate middle-man.

    However, what happens if humans stop creating and there is nothing new to feed AI? What happens to AI’s generative power? Is all this then worth it?


  2. Amy Kang

    Thanks for the comment and the article! You brought up some really interesting questions.

    I don’t think I should have been so brash when I said it is questionable AI will have financial impact on the author’s works. I do appreciate the fear of AI leading to writers’ job loss. In addition, I recognize that this may impact some industries more than others. For example, currently ChatGPT is much better at generating fact-based articles rather than writing compelling fictional stories. As such, there is certainly a question to be asked about whether AI might financially compete with copyright holders, particularly in the future when the AI model improves. However, as of now, I do think it would be difficult for the authors to establish that AI is competing with the market of their works both due to the current quality of AI writing and the lack of evidence that people are gravitating towards AI-generated content. In addition, I think people who like the works of an author will still read the works of that author no matter how much derivative content is out there (whether by AI or by other humans). For example, authors generally view fanfiction and fan art as good publicity for their works. But again, I recognize this might not apply to all writers.

    In addition, AI, like any tool, can be abused. OpenAI is using the works of the authors purely for the purpose of developing its AI model, not for replacing writers. It is other individuals who are using the tool for more nefarious purposes such as spamming Amazon with AI-written books. In other words, from OpenAI’s perspective, the reproduced work is only used for training AI, while other users are using the product of the training to generate content which may infringe on the economic or moral rights of the author. I do wonder how this might factor into the fair dealing analysis? Should it be the responsibility of OpenAI to pay for how its tool may be used by its end-users?

  3. Emma Lam

    Hi Amy,

    Fascinating post! I think it’s absolutely imperative that someone develop a sophisticated AI well-versed in GRRM’s writings and the world of ice and fire. As much as I don’t want it to be the case, there’s a good chance that GRRM might never finish the series, and fans need something!

    Kidding aside, I’m also wondering how copyright defences work when the usage changes. Initially, OpenAI’s use of text to develop chatGPT could very well fall under some of the fair dealing categories (be it research, education, or private study). But as with any technology, once it is mature enough, it will be commercialized, which makes it a lot less fair dealing. What happens or what should the law do once the usage changes? If a technology is not yet commercialized, it gets the fair dealing defence? Whereas once it’s mature enough to be profitable, it then needs to start paying the authors?