Copyrighted Works and AI Training: The Legal Landscape

Copyright law and AI training don't play nice. Here's what you should know.

In the rapidly evolving realm of artificial intelligence (AI), OpenAI has taken center stage with a recent assertion made before a UK parliamentary committee. This incident has ignited discussions on the legalities surrounding the use of copyrighted works for training AI models without explicit permission. This article aims to explore the intricate legal implications surrounding this practice, examining various perspectives and potential repercussions for the future of AI development.

Table of Contents

OpenAI’s Assertive Stance

At the core of OpenAI’s argument is the bold claim that developing cutting-edge AI systems like ChatGPT would be “impossible” without tapping into vast amounts of copyrighted data. The company contends that the extensive training required for these AI tools makes strict adherence to laws unfeasible. OpenAI suggests that almost every form of human expression, ranging from news articles to forum comments to digital images, would be off-limits for training data if stringent compliance were enforced.

Obstacles to Copyright Adherence

OpenAI outlines the challenges of complying with copyright laws in the context of AI development. The prevalence of protected online content and expansive regulations make it challenging to find training data that is both comprehensive and legally permissible. The company argues that restricting training data to public domain works from over a century ago would not align with the needs of contemporary society, underscoring the impracticality of stringent copyright adherence in the fast-paced world of AI innovation.

Legal Disputes and Lawsuits

OpenAI’s stance on AI model training has not gone without repercussions. The company finds itself entangled in multiple lawsuits, including legal action from media outlets like The New York Times, accusing it of copyright breaches. These legal battles highlight the complexities and legal risks associated with the unrestrained use of copyrighted material in AI development. Despite facing legal challenges, OpenAI remains steadfast in its commitment to broad data collection and training processes.

Potential Middle Ground: Collaborations and Compensation

While maintaining the compliance of its practices, OpenAI acknowledges the potential for collaborations and compensation arrangements with publishers to “support and empower creators.” This indicates a willingness to cooperate with content creators and rights holders, although without a clear indication of a substantial reduction in data harvesting practices. The company’s position opens avenues for potential compromises that strike a balance between AI development and respect for copyright holders.

The Fair Use Argument

OpenAI’s strategy revolves around broad interpretations of fair use allowances to legally leverage extensive amounts of copyrighted data. By relying on fair use, the company aims to justify its robust data collection practices as crucial for the development of advanced AI systems. The legal interpretation of fair use in the context of AI development is expected to be a focal point in ongoing legal battles, with experts anticipating robust debates around infringement by AI systems designed to assimilate protected text, media, and other creative output.

The Future Intersection of AI and Copyright

As advanced AI continues to demonstrate remarkable capabilities in emulating human expression, the legal landscape surrounding copyright and AI development is poised for evolution. OpenAI’s bet against copyright maximalists, favoring near-boundless copying, represents a bold stance that could shape the future of AI innovation. The ongoing legal battles are likely to establish precedents for the industry, influencing how AI developers navigate the delicate balance between pushing technological boundaries and respecting intellectual property rights.


The convergence of AI development and copyright law presents a multifaceted and contentious landscape. OpenAI’s assertion that strict adherence to copyright law is “impossible” raises crucial questions about the ethical and legal considerations in training AI models. As legal debates unfold and the industry grapples with these challenges, striking a balance between fostering innovation and respecting intellectual property rights becomes paramount. The outcomes of these discussions will undoubtedly mold the trajectory of AI development and its relationship with copyright in the foreseeable future.

Ready to make the most of technology in your business? Contact NPEC and follow us on social media today.

Share this post with your friends