AO3 vs AI: Are copyright claims the solution to unauthorized data scraping of fanfiction sites?

Fanfiction refers to creative fiction produced by fans of a particular original work that derives from its characters, plot, settings, or themes. Fanworks are characterized by their transformative nature as creative reinterpretations and expansions upon the original source material. They are non-commercial in nature, crafted out of a genuine love for the source content and a desire to share creative expressions within fan communities. The recent discovery that generative AI companies have tapped into fanfiction for training its models has prompted debates on copyright, fair use/fair dealing, and potential redress methods among fan creators.

By way of background, one of the largest sites for hosting fanworks is archiveofourown.org (“AO3”), a non-commercial and not-for-profit project run by the Organization for Transformative Works (“OTW”). Earlier this year, an AI writing tool powered by OpenAI’s GPT-3 was revealed to have hyper-specific familiarity with an erotic fiction trope popular within fandom circles called the “Omegaverse”.[1] The Omegaverse being predominantly self-contained within niche fandom spaces, with vocabulary that would never organically appear in other areas of the Internet, has led people to conclude that these AI tools have sourced their training data by scraping fanfiction sites like AO3. Indeed, this is not surprising given that most large language models (LLMs) use some version of the CommonCrawl dataset, which has crawled 12 years’ worth of content on the publicly available Internet without differentiating between copyrighted and non-copyrighted content.

A screenshot of the home page of Archive of Our Own (AO3).

The revelation has incited widespread outrage among fan creators globally, with concerns echoing the sentiments expressed by the Writers Guild of America (WGA), highlighting the use of copyrighted or copyrightable material without authors’ consent. The backlash is fuelled by the nature of fanworks as non-commercial creations solely driven by enjoyment. In light of these concerns, the legal chair of OTW and law professor Betsy Rosenblatt has stated that while fanfiction may not be created for profit, it remains eligible for copyright claims.[2] While fanfiction is itself a derivative work, writers may still hold copyright over the original elements they contribute, such as characters, plot structures, and specific word choices.

It is true that, despite not residing in the public domain, fanfiction is likely protected by user’s rights provisions in copyright law. In Canada, fair dealing categories like parody, satire, criticism and review, receive legal protection from copyright infringement, under which fanfiction may be covered.[3] Although fair dealing in Canada does not explicitly consider the “transformative” nature of work as fair use in the US, a fairness assessment would nonetheless take into account factors such as the absence of commercialization, the extent of original versus “reproduced” work, and the transformative nature of the fan-made work. Further, non-commercial user-generated content is protected as long as proper attribution is provided and the new work does not serve as a market substitute for the original.[4] As stated by OTW themselves, it is unlikely that fanworks themselves are a form of copyright infringement.[5] Since there is no need for registration, fan creators would have copyright in their literary or artistic works if there is fixation and originality.[6]

Does the use of fanworks to train AI prima facie infringe upon the rights of a copyright owner? Scraping content for use as training data likely constitutes reproduction in some form. However, establishing that a substantial part of individual works were taken might pose a challenge. The black-box nature of most generative AI models may make it difficult to identify the specific influences that contributed to the creation of an AI-generated work. Moral rights could potentially come into play in Canada, which covers the copyright owner’s right to be associated with their work and the right to maintain the integrity of the work.[7] Fan creators might have a claim against generative AI companies for using and distorting their work in the AI training process.

Would an AI company be able to raise fair use or fair dealing to justify their use of copyrighted works in training their models? In the US, Andy Warhol Foundation for the Visual Arts v. Goldsmith (2023) 598 U.S. ___ suggests that the appropriation of non-monetized copyrighted works to create market substitutes for original works (whether literary or artistic) may possibly fall short of fair use.[8] Some are opposed to the classification of AI-generated works as transformative because AI systems are dedicated to replication rather than creative innovation, and their proliferation would undermine the originality and value contributed by human creators in the fandom community. As for Canada, where no explicit category for transformative purposes exists, the legal terrain surrounding whether scraping data constitutes the fair dealing categories (potentially research or private study?) remains an open question.

Still, it is worth noting that in the realm of fanworks, copyright claims may not be the most appropriate solution. Rosenblatt notes that while fanfiction is eligible for copyright protection, many writers choose not to pursue it due to factors like lack of knowledge, financial constraints, or a general unwillingness to navigate the legal complexities.[9] Frustration about their works being used to train AI has led to inventive responses by fanfiction writers, including a writing marathon called “Knot in My Name” aiming to flood platforms with Omegaverse content to corrupt the AI systems.[10] The OTW has suggested protective measures like restricting works to AO3 users-only and implemented code to deter large-scale scraping. Amidst these discussions, a commenter on the OTW forum post challenged the community’s tendency to equate AI-generated content to theft, highlighting that both AI and fanfiction authors create new works based on existing material without explicit permission.[11] In light of the legal gray area in which fanworks generally occupy, they questioned whether stricter copyright enforcement aligns with the best interests of fan communities, suggesting that such measures might inadvertently pose a danger to all fanworks.

This case underscores the need for nuanced approaches that consider the unique landscape of fanworks and AI-generated content, suggesting that alternative forms of redress may be more appropriate than copyright claims in addressing the evolving intersection of creativity and technology.

(995 words)

[1] https://www.reddit.com/r/AO3/comments/z9apih/sudowrites_scraping_and_mining_ao3_for_its/; https://www.wired.com/story/fanfiction-omegaverse-sex-trope-artificial-intelligence-knotting/

[2] https://www.arl.org/blog/applying-intellectual-property-law-to-ai-an-interview-with-betsy-rosenblatt/

[3] Sections 29, 29.1, Copyright Act.

[4] Section 29.21, Copyright Act.

[5] https://archiveofourown.org/admin_posts/9918

[6] Section 5(1), Copyright Act.

[7] Section 14.1, 14.2, Copyright Act.

[8] https://www.hollywoodreporter.com/business/business-news/ai-scraping-stealing-copyright-law-1235571501/

[9] https://www.arl.org/blog/applying-intellectual-property-law-to-ai-an-interview-with-betsy-rosenblatt/

[10] https://techcrunch.com/2023/06/13/fan-fiction-writers-are-trolling-ais-with-omegaverse-stories/

[11] https://archiveofourown.org/comments/650545801

AO3 vs AI: Are copyright claims the solution to unauthorized data scraping of fanfiction sites?

Copyright & Social Media

Communications Law

Top Commented

Featured