In the 2020s, the rapid increase in the capabilities of

deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...

-based

generative artificial intelligence Generative artificial intelligence (generative AI, GenAI, or GAI) is artificial intelligence capable of generating text, images or other data using generative models, often in response to prompts. Generative AI models learn the patterns and s ...

models, including

text-to-image model A text-to-image model is a machine learning model which takes as input a natural language description and produces an image matching that description. Such models began to be developed in the mid-2010s, as a result of advances in deep neural netwo ...

s such as

Stable Diffusion Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generat ...

and

large language model A large language model (LLM) is a language model consisting of a neural network with many parameters (typically billions of weights or more), trained on large quantities of unlabelled text using self-supervised learning. LLMs emerged around 2018 an ...

s such as

ChatGPT ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...

, are posing questions of how

copyright law A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, education ...

applies to the training and use of such models. Because there is limited existing case law, experts consider this area to be fraught with uncertainty. The largest issue regards whether infringement occurs when the generative AI is trained or used. Popular deep learning models are generally trained on very large datasets of media scraped from the Internet, much of which is copyrighted. Since the process of assembling training data involves making copies of copyrighted works it may violate the copyright holder's exclusive right to control the reproduction of their work, unless the use is covered by exceptions under a given jurisdiction's copyright statute. Additionally, the use of a model's outputs could be infringing, and the model creator may be accused of "vicarious liability" for said infringement. As of 2023, there are a number of pending US lawsuits challenging the use of copyrighted data to train AI models, with defendants arguing that this falls under

fair use Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to balance the interests ...

. In at least one jurisdiction in the US, output generated solely by a machine has been ruled ineligible for copyright protection. In ''Thaler v. Perlmutter,'' the Court held that “work created absent any human involvement” is ineligible for copyright protection, stating, "Human authorship is a bedrock requirement of copyright." What was left undecided in this case is the ''degree'' of human involvement required to take a work from being machine-created to being human-created. "Originality" being central to questions of copyright eligibility, most jurisdictions protect only "original" works having a human author. The US Copyright Office has said it will register works that contain "otherwise unprotectable material" but only if the new work contains a "sufficient amount of original authorship".

Copyright status of AI-generated works

Most legal jurisdictions grant copyright only to original works of authorship by human authors. In the US, the Copyright Act protects "original works of authorship". The

U.S. Copyright Office The United States Copyright Office (USCO), a part of the Library of Congress, is a United States government body that maintains records of copyright registration, including a copyright catalog. It is used by copyright title searchers who are ...

has interpreted this as being limited to works "created by a human being", declining to grant copyright to works generated solely by a machine. Some have suggested that certain AI generations might be copyrightable in the US and similar jurisdictions if it can be shown that the human who ran the AI program exercised sufficient originality in selecting the inputs to the AI or editing the AI's output. Proponents of this view suggest that an AI model may be viewed as merely a tool (akin to a pen or a camera) used by its human operator to express their creative vision. For example, proponents argue that if the standard of originality can be satisfied by an artist clicking the shutter button on a camera, then perhaps artists using generative AI should get similar deference, especially if they go through multiple rounds of revision to refine their prompts to the AI. Other proponents argue that the Copyright Office is not taking a technology neutral approach to the use of AI (or algorithmic) tools. For other creative expressions (music, photography, writing) the test is effectively whether there is ''de minimis'' human creativity. For works using AI tools, the Copyright Office have made the test a different one i.e. whether there is no more than ''de minimis'' technological involvement.Peter Pink-Howitt
Copyright, AI And Generative Art
''Ramparts'', 2023. This difference in approach can be seen in the recent decision in respect of a registration claim by Jason Matthew Allen for his work '' Théâtre D'opéra Spatial'' created using Midjourney (and an upscaling tool) where the Copyright Office stated:

The Board finds that the Work contains more than a de minimis amount of content generated by artificial intelligence ("AI"), and this content must therefore be disclaimed in an application for registration. Because Mr. Allen is unwilling to disclaim the AI-generated material, the Work cannot be registered as submitted.

As AI is increasingly used to generate literature, music, and other forms of art, the US Copyright Office has released new guidance emphasizing whether works, including materials generated by artificial intelligence, exhibit a 'mechanical reproduction' nature or are the 'manifestation of the author's own creative conception'. The US Copyright Office published a Rule in March 2023 on a range of issues related to the use of AI, where they stated:

...because the Office receives roughly half a million applications for registration each year, it sees new trends in registration activity that may require modifying or expanding the information required to be disclosed on an application. One such recent development is the use of sophisticated artificial intelligence ("AI") technologies capable of producing expressive material. These technologies "train" on vast quantities of preexisting human-authored works and use inferences from that training to generate new content. Some systems operate in response to a user's textual instruction, called a "prompt." The resulting output may be textual, visual, or audio, and is determined by the AI based on its design and the material it has been trained on. These technologies, often described as "generative AI," raise questions about whether the material they produce is protected by copyright, whether works consisting of both human-authored and AI-generated material may be registered, and what information should be provided to the Office by applicants seeking to register them.

Some jurisdictions include explicit statutory language related to computer-generated works, including the United Kingdom's

Copyright, Designs and Patents Act 1988 The Copyright, Designs and Patents Act 1988c 48, also known as the CDPA, is an Act of the Parliament of the United Kingdom that received Royal Assent on 15 November 1988. It reformulates almost completely the statutory basis of copyright law ( ...

, which states:

In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.

However, the computer generated work law under UK law relates to autonomous creations by computer programs. Individuals using AI tools will usually be the authors of the works assuming they meet the minimum requirements for copyright work. The language used for computer generated work relates, in respect of AI, to the ability of the human programmers to have copyright in the autonomous productions of the AI tools (i.e. where there is no direct human input):

In so far as each composite frame is a computer generated work then the arrangements necessary for the creation of the work were undertaken by Mr Jones because he devised the appearance of the various elements of the game and the rules and logic by which each frame is generated and he wrote the relevant computer program. In these circumstances I am satisfied that Mr Jones is the person by whom the arrangements necessary for the creation of the works were undertaken and therefore is deemed to be the author by virtue of s.9(3)

There are wide-ranging reviews of the use of AI and its impact on copyright. The UK government has consulted on the use of generative tools and AI in respect of intellectual property leading to a proposed specialist Code of Practice: "to provide guidance to support AI firms to access copyrighted work as an input to their models, whilst ensuring there are protections on generated output to support right holders of copyrighted work".Artificial Intelligence and Intellectual Property: copyright and patents: Government response to consultation
UK Government 2023. The US Copyright Office recently published a Notice of inquiry and request for comments following its 2023 Registration Guidance. On November 27, 2023, the

Beijing } Beijing ( ; ; ), alternatively romanized as Peking ( ), is the capital of the People's Republic of China. It is the center of power and development of the country. Beijing is the world's most populous national capital city, with over 21 ...

Internet Court issued a decision recognizing copyright in AI-generated images in a litigation. As noted by a lawyer and AI art creator, the challenge for intellectual property regulators, legislators and the courts is how to protect human creativity in a technologically neutral fashion whilst considering the risks of automated AI factories. AI tools have the ability to autonomously create a range of material that is potentially subject to copyright (music, blogs, poetry, images, and technical papers) or other intellectual property rights (such as patents and design rights). This represents an unprecedented challenge to existing intellectual property regimes.

Training on copyrighted data

Popular deep learning models are generally trained on very large datasets of media (such as publicly available images and the text of web pages) scraped from the Internet, much of which is copyrighted. (As of 2023, alternate open-source approaches like "The Stack" by

Hugging Face Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users ...

are rare.) Specifically, the images have text added like captions, titles, and tags that describe the photos' contents. The text and images are then converted into numeric formats the AI can analyze. Next, a deep learning model identifies patterns linking the encoded text and image data, learning which text concepts correspond to elements in images. Through iterative testing and tuning, the model refines its accuracy matching images to text descriptions. Finally, the now trained model undergoes validation to evaluate its skill generating or manipulating new images based solely on text prompts provide after the training process. Because assembling these training datasets involves making copies of copyrighted works, this has raised the question of whether this process infringes the copyright holders' exclusive right to make reproductions of their works. Machine learning developers in the US have traditionally presumed this to be allowable under

, because the use of copyrighted work is

transformative In United States copyright law, transformative use or transformation is a type of fair use that builds on a copyrighted work in a different manner or for a different purpose from the original, and thus does not infringe its holder's copyright. Tr ...

, and limited. The situation has been compared to

Google Books Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical c ...

's scanning of copyrighted books in '' Authors Guild, Inc. v. Google, Inc.'', which was ultimately found to be fair use, because the scanned content was not made publicly available, and the use was non-expressive. As of 2023, there were a number of US lawsuits disputing this, arguing that the training of machine learning models infringed the copyright of the authors of works contained in the training data. Timothy B. Lee, in ''

Ars Technica ''Ars Technica'' is a website covering news and opinions in technology, science, politics, and society, created by Ken Fisher and Jon Stokes in 1998. It publishes news, reviews, and guides on issues such as computer hardware and software, sci ...

'', argues that if the plaintiffs succeed, this may shift the balance of power in favour of large corporations such as Google, Microsoft and Meta which can afford to license large amounts of training data from copyright holders and leverage their own proprietary datasets of user-generated data. IP scholars Bryan Casey and

Mark Lemley Mark A. Lemley (born c. 1966) is currently the William H. Neukom Professor of Law at Stanford Law School and the Director of the Stanford Law School Program in Law, Science & Technology, as well as a founding partner of the law firm of Durie Tang ...

argue in the ''

Texas Law Review The ''Texas Law Review'' is a student-edited and -produced law review affiliated with the University of Texas School of Law (Austin). It ranks number 6 on Washington & Lee University's list, number 11 on Google Scholar's list of top publications i ...

'' that datasets are so large that "there is no plausible option simply to license all of the (data)... allowing (any generative training) copyright claim is tantamount to saying, not that copyright owners will get paid, but that the use won't be permitted at all." Other scholars disagree; some predict a similar outcome to US

music licensing Music licensing is the licensed use of copyrighted music. Music licensing is intended to ensure that the owners of copyrights on musical works are compensated for certain uses of their work. A purchaser has limited rights to use the work without ...

procedures. A number of jurisdictions have explicitly incorporated exceptions allowing for "text and data mining" (TDM) in their copyright statutes including the United Kingdom, Germany, Japan, and the EU. Unlike the EU, the United Kingdom prohibits data mining for commercial purposes but have proposed this should be changed to support the development of AI: "''For text and data mining, we plan to introduce a new copyright and database exception which allows TDM for any purpose. Rights holders will still have safeguards to protect their content, including a requirement for lawful access.''". As of June 2023, a clause in the draft EU AI Act would require generative AI to "make available summaries of the copyrighted material that was used to train their systems".

Copyright infringing AI outputs

In some cases, deep learning models may "memorize" the details of particular items in their training set, and reproduce them at generation time, such that their outputs may be considered copyright infringement. This behaviour is generally considered undesirable by AI developers (being considered a form of

overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...

), and disagreement exists as to how prevalent this behaviour is in modern systems.

OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...

has argued that "well-constructed AI systems generally do not regenerate, in any nontrivial portion, unaltered data from any particular work in their training corpus". Under US law, to prove that an AI output infringes a copyright, a plaintiff must show the copyrighted work was "actually copied", meaning that the AI generates output which is "substantially similar" to their work, and that the AI had access to their work. Since fictional characters enjoy some copyright protection in the US and other jurisdictions, an AI may also produce infringing content in the form of novel works which incorporate fictional characters. In the course of learning to statistically model the data on which they are trained, deep generative AI models may learn to imitate the distinct style of particular authors in the training set. For example, a generative image model such as Stable Diffusion is able to model the stylistic characteristics of an artist like

Pablo Picasso Pablo Ruiz Picasso (25 October 1881 – 8 April 1973) was a Spanish painter, sculptor, printmaker, ceramicist and Scenic design, theatre designer who spent most of his adult life in France. One of the most influential artists of the 20th ce ...

(including his particular brush strokes, use of colour, perspective, and so on), and a user can engineer a prompt such as "an astronaut riding a horse, by Picasso" to cause the model to generate a novel image applying the artist's style to an arbitrary subject. However, an artist's overall style is generally not subject to copyright protection.

Litigation

* A November 2022 class action lawsuit against

Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...

GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...

and

alleged that

GitHub Copilot GitHub Copilot is a cloud-based artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently ...

, an AI-powered code editing tool trained on public GitHub repositories, violated the copyright of the repositories' authors, noting that the tool was able to generate source code which matched its training data verbatim, without providing attribution. * In January 2023 three artists —

Sarah Andersen Sarah Andersen is an American cartoonist and illustrator, and the author of the webcomic '' Sarah's Scribbles''. Biography Andersen graduated from the Maryland Institute College of Art (MICA) in 2014. While attending MICA, she started drawing t ...

, Kelly McKernan, and Karla Ortiz — filed a class action

copyright infringement Copyright infringement (at times referred to as piracy) is the use of works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the copyright holder, s ...

lawsuit against

Stability AI Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generat ...

Midjourney Midjourney is an independent research lab that produces an artificial intelligence program under the same name that creates images from textual descriptions, similar to OpenAI's DALL-E and Stable Diffusion. It is speculated that the underlying t ...

, and

DeviantArt DeviantArt (historically stylized as deviantART) is an American online art community that features artwork, videography and photography, launched on August 7, 2000 by Angelo Sotira, Scott Jarkoff, and Matthew Stephens among others. DeviantArt, ...

, claiming that these companies have infringed the rights of millions of artists by training AI tools on five billion images scraped from the web without the consent of the original artists. The plaintiffs' complaint has been criticized for technical inaccuracies, such as incorrectly claiming that "a trained diffusion model can produce a copy of any of its Training Images", and describing Stable Diffusion as "merely a complex collage tool". In addition to copyright infringement, the plaintiffs allege unlawful competition and violation of their

right of publicity Personality rights, sometimes referred to as the right of publicity, are rights for an individual to control the commercial use of their identity, such as name, image, likeness, or other unequivocal identifiers. They are generally considered as ...

in relation to AI tools' ability to create works in the style of the plaintiffs ''en masse''. In July 2023, U.S. District Judge William Orrick inclined to dismiss most of the lawsuit filed by Andersen, McKernan, and Ortiz but allowed them to file a new complaint. * In January 2023, Stability AI was sued in London by

Getty Images Getty Images Holdings, Inc. is an American visual media company and is a supplier of stock images, editorial photography, video and music for business and consumers, with a library of over 477 million assets. It targets three markets— creative ...

for using its images in their training data without purchasing a license. * Getty filed another suit against Stability AI in a US district court in Delaware in February 2023. The suit again alleges copyright infringement for the use of Getty's images in the training of Stable Diffusion, and further argues that the model infringes Getty's

trademark A trademark (also written trade mark or trade-mark) is a type of intellectual property consisting of a recognizable sign, design, or expression that identifies products or services from a particular source and distinguishes them from others ...

by generating images with Getty's

watermark A watermark is an identifying image or pattern in paper that appears as various shades of lightness/darkness when viewed by transmitted light (or when viewed by reflected light, atop a dark background), caused by thickness or density variations ...

. * In July 2023, authors Paul Tremblay and Mona Awad have filed a lawsuit against OpenAI, alleging that ChatGPT, OpenAI's language model, used their copyrighted books without permission. ChatGPT's accurate summaries of their works suggest unauthorized content use. This lawsuit highlights the battle between copyright owners and AI companies, potentially leading to discussions on copyright rules and data source disclosure. * In the case ''Thaler v. Perlmutter'', the U.S. Court of Appeals for the Federal Circuit ruled on the

patentability Within the context of a national or multilateral body of law, an invention is patentable if it meets the relevant legal conditions to be granted a patent. By extension, patentability also refers to the substantive conditions that must be met fo ...

of inventions created by an AI system. The case revolved around Stephen Thaler's use of his AI program,

DABUS DABUS (Device for the Autonomous Bootstrapping of Unified Sentience) is an artificial intelligence (AI) system created by Stephen Thaler. It reportedly conceived two inventions. The filing of patent applications designating DABUS as inventor has led ...

, in the creation of two inventions. Thaler argued that DABUS should be recognized as the inventor. The court upheld the U.S. Patent and Trademark Office's decision, stating that under U.S. law, only 'natural persons' can be named as inventors on patent applications. While USPTO has not challenged granting patent protection to AI inventions if the inventor on the application is a “natural person”, copyright protection is not granted to art produced by the machine without any creative input or invention from a human author. The USPTO since published a rule in February 2024 affirming this ruling, but allowing for human inventors to incorporate the output of artificial intelligence, as long as this method is appropriately documented in the patent application.https://arstechnica.com/information-technology/2024/02/us-says-ai-models-cant-hold-patents/ However, it may become virtually impossible as when the inner workings and the use of AI in inventive transactions are not adequately understood or are largely unknown.Valinasab, Omid, "Big Data Analytics to Automate Patent Disclosure of Artificial Intelligence’s Inventions." U.S.F. Intell. Prop. & Tech. L.J. 133 (2023).

References

{{reflist, refs= {{cite web , last=Guadamuz, first=Andres , date=October 2017 , title=Artificial intelligence and copyright , work=WIPO Magazine , url=https://www.wipo.int/wipo_magazine/en/2017/05/article_0003.html {{cite web , title=The scary truth about AI copyright is nobody knows what will happen next , last=Vincent, first=James , date=15 November 2022 , work=The Verge , url=https://www.theverge.com/23444685/generative-ai-copyright-infringement-legal-fair-use-training-data {{cite web , title=Generative Artificial Intelligence and Copyright Law , date=24 February 2023 , last=Zirpoli, first=Christopher T. , publisher=Congressional Research Service , url=https://crsreports.congress.gov/product/pdf/LSB/LSB10922 {{cite book , title=Artificial Intelligence and Intellectual Property , editor-last1=Lee, editor-first1=Jyh-An, editor-last2=Hilty, editor-first2=Reto, editor-last3=Liu, editor-first3=Kung-Chung , publisher=Oxford University Press , doi=10.1093/oso/9780198870944.001.0001 , year=2021 , isbn=978-0-19-887094-4 {{cite web , work=Ars Technica , last=Lee, first=Timothy B. , date=3 April 2023 , title=Stable Diffusion copyright lawsuits could be a legal earthquake for AI , url=https://arstechnica.com/tech-policy/2023/04/stable-diffusion-copyright-lawsuits-could-be-a-legal-earthquake-for-ai/ {{Cite web , last=Vincent , first=James , date=2022-11-08 , title=The lawsuit that could rewrite the rules of AI copyright , url=https://www.theverge.com/2022/11/8/23446821/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data , access-date=2022-12-07 , website=The Verge , language=en-US {{Cite web , last=Korn , first=Jennifer , date=2023-01-17 , title=Getty Images suing the makers of popular AI art tool for allegedly stealing photos , url=https://www.cnn.com/2023/01/17/tech/getty-images-stability-ai-lawsuit/index.html , access-date=2023-01-22 , website=CNN , language=en James Vincent "AI art tools Stable Diffusion and Midjourney targeted with copyright lawsuit" The Verge, 16 January, 2023.
/ref> {{cite web , title=Artists file class-action lawsuit against AI image generator companies , date=16 January 2023 , last=Edwards, first=Benj , work=Ars Technica , url=https://arstechnica.com/information-technology/2023/01/artists-file-class-action-lawsuit-against-ai-image-generator-companies/ {{cite web , url=https://newsroom.gettyimages.com/en/getty-images/getty-images-statement , title= Getty Images Statement, author= , date=17 January 2023 , website=newsroom.gettyimages.com/ , publisher= , access-date=24 January 2023 , quote= {{cite web , work=Ars Technica , title=Getty sues Stability AI for copying 12M photos and imitating famous watermark , last=Belanger, first=Ashley , date=6 February 2023 , url=https://arstechnica.com/tech-policy/2023/02/getty-sues-stability-ai-for-copying-12m-photos-and-imitating-famous-watermark/

External links

Getty Images (US), Inc. v. Stability AI, Inc.
filings Artificial intelligence Copyright law