Generative Pre-trained Transformer 3 (GPT-3) is a

large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...

released by

OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...

in 2020. Like its predecessor,

GPT-2 Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of Generative pre-trained transformer, GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was par ...

, it is a decoder-only

transformer model In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic ...

of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "

attention Attention or focus, is the concentration of awareness on some phenomenon to the exclusion of other stimuli. It is the selective concentration on discrete information, either subjectively or objectively. William James (1890) wrote that "Atte ...

". This attention mechanism allows the model to focus selectively on segments of input text it predicts to be most relevant. GPT-3 has 175 billion

parameters A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

, each with 16-bit precision, requiring 350GB of storage since each parameter occupies 2 bytes. It has a context window size of 2048 tokens, and has demonstrated strong " zero-shot" and " few-shot" learning abilities on many tasks. On September 22, 2020,

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

announced that it had licensed GPT-3 exclusively. Others can still receive output from its public API, but only Microsoft has access to the underlying model.

Background

According to ''

The Economist ''The Economist'' is a British newspaper published weekly in printed magazine format and daily on Electronic publishing, digital platforms. It publishes stories on topics that include economics, business, geopolitics, technology and culture. M ...

'', improved algorithms, more powerful computers, and a recent increase in the amount of digitized material have fueled a revolution in

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

. New techniques in the 2010s resulted in "rapid improvements in tasks", including manipulating language. Software models are trained to learn by using thousands or millions of examples in a "structure... loosely based on the neural architecture of the brain". One architecture used in

natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...

(NLP) is a

neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...

based on a

deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

model that was introduced in 2017—the

transformer In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...

architecture. There are a number of NLP systems capable of processing, mining, organizing, connecting and contrasting textual input, as well as correctly answering questions. On June 11, 2018, OpenAI researchers and engineers published a paper introducing the first

generative pre-trained transformer A generative pre-trained transformer (GPT) is a type of large language model (LLM) and a prominent framework for generative artificial intelligence. It is an Neural network (machine learning), artificial neural network that is used in natural ...

(GPT)a type of generative

that is pre-trained with an enormous and diverse

text corpus In linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corp ...

in datasets, followed by discriminative fine-tuning to focus on a specific task. GPT models are transformer-based deep-learning neural network architectures. Previously, the best-performing neural NLP models commonly employed

supervised learning In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...

from large amounts of manually-labeled data, which made it prohibitively expensive and time-consuming to train extremely large language models. The first GPT model was known as "GPT-1," and it was followed by "GPT-2" in February 2019. Created as a direct scale-up of its predecessor, GPT-2 had both its parameter count and dataset size increased by a factor of 10. It had 1.5 billion parameters, and was trained on a dataset of 8 million web pages. In February 2020, Microsoft introduced its Turing Natural Language Generation (T-NLG), which they claimed was "largest language model ever published at 17 billion parameters." It performed better than any other language model at a variety of tasks, including summarizing texts and answering questions.

Training and capabilities

On May 28, 2020, an

arXiv arXiv (pronounced as "archive"—the X represents the Chi (letter), Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not Scholarly pee ...

preprint by a group of 31 engineers and researchers at OpenAI described the achievement and development of GPT-3, a third-generation "state-of-the-art language model". The team increased the capacity of GPT-3 by over two orders of magnitude from that of its predecessor, GPT-2, making GPT-3 the largest non-sparse language model to date. Four preprints were released between May 28 and July 22, 2020. Because GPT-3 is structurally similar to its predecessors, its greater accuracy is attributed to its increased capacity and greater number of parameters. GPT-3's capacity is ten times larger than that of Microsoft's Turing NLG, the next largest NLP model known at the time. Lambdalabs estimated a hypothetical cost of around $4.6 million US dollars and 355 years to train GPT-3 on a single

GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...

in 2020, with lower actual training time by using more GPUs in parallel. Sixty percent of the weighted pre-training dataset for GPT-3 comes from a filtered version of

Common Crawl Common Crawl is a nonprofit organization, nonprofit 501(c) organization#501.28c.29.283.29, 501(c)(3) organization that web crawler, crawls the web and freely provides its archives and datasets to the public. Common Crawl's Web archiving, web arch ...

consisting of 410 billion byte-pair-encoded tokens. Fuzzy deduplication used

Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...

MinHash In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was published by Andrei Broder in a 1997 confer ...

LSH. Other sources are 19 billion tokens from WebText2 representing 22% of the weighted total, 12 billion tokens from Books1 representing 8%, 55 billion tokens from Books2 representing 8%, and 3 billion tokens from Wikipedia representing 3%. GPT-3 was trained on hundreds of billions of words and is also capable of coding in CSS, JSX, and

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...

, among others. Since GPT-3's training data was all-encompassing, it does not require further training for distinct language tasks. The training data contains occasional toxic language and GPT-3 occasionally generates toxic language as a result of mimicking its training data. A study from the

University of Washington The University of Washington (UW and informally U-Dub or U Dub) is a public research university in Seattle, Washington, United States. Founded in 1861, the University of Washington is one of the oldest universities on the West Coast of the Uni ...

found that GPT-3 produced toxic language at a toxicity level comparable to the similar natural language processing models of

and CTRL. OpenAI has implemented several strategies to limit the amount of toxic language generated by GPT-3. As a result, GPT-3 produced less toxic language compared to its predecessor model, GPT-1, although it produced both more generations and a higher toxicity of toxic language compared to CTRL Wiki, a language model trained entirely on Wikipedia data. On June 11, 2020,

announced that users could request access to its user-friendly GPT-3 API—a "machine learning toolset"—to help OpenAI "explore the strengths and limits" of this new technology. The invitation described how this API had a general-purpose "text in, text out" interface that can complete almost "any English language task", instead of the usual single use-case. According to one user, who had access to a private early release of the OpenAI GPT-3 API, GPT-3 was "eerily good" at writing "amazingly coherent text" with only a few simple prompts. In an initial experiment 80 US subjects were asked to judge if short ~200 word articles were written by humans or GPT-3. The participants judged correctly 52% of the time, doing only slightly better than random guessing. On November 18, 2021, OpenAI announced that enough safeguards had been implemented that access to its API would be unrestricted. OpenAI provided developers with a content moderation tool that helps them abide by OpenAI's content policy. On January 27, 2022, OpenAI announced that its newest GPT-3 language models (collectively referred to as InstructGPT) were now the default language model used on their

API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...

. According to OpenAI, InstructGPT produced content that was better aligned to user intentions by following instructions better, generating fewer made-up facts, and producing somewhat less toxic content. Because GPT-3 can "generate news articles which human evaluators have difficulty distinguishing from articles written by humans," GPT-3 has the "potential to advance both the beneficial and harmful applications of language models." In their May 28, 2020 paper, the researchers described in detail the potential "harmful effects of GPT-3" which include "misinformation,

spam Spam most often refers to: * Spam (food), a consumer brand product of canned processed pork of the Hormel Foods Corporation * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ...

phishing Phishing is a form of social engineering and a scam where attackers deceive people into revealing sensitive information or installing malware such as viruses, worms, adware, or ransomware. Phishing attacks have become increasingly sophisticate ...

, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting". The authors draw attention to these dangers to call for research on

risk mitigation Mitigation is the reduction of something harmful that has occurred or the reduction of its harmful effects. It may refer to measures taken to reduce the harmful effects of hazards that remain ''in potentia'', or to manage harmful incidents that ...

. GPT-3 is capable of performing zero-shot and few-shot learning (including one-shot). In June 2022, Almira Osmanovic Thunström wrote that GPT-3 was the primary author on an article on itself, that they had submitted it for publication, and that it had been pre-published while waiting for completion of its review.

GPT-3 models

There are many models in the GPT-3 family, some serving different purposes than others. In the initial research paper published by OpenAI, they mentioned 8 different sizes of the main GPT-3 model (Table 2.1): Half of the models are accessible through the API, namely GPT-3-medium, GPT-3-xl, GPT-3-6.7B and GPT-3-175b, which are referred to as ada, babbage, curie and davinci respectively. While the size of the API models was not originally disclosed by OpenAI,

EleutherAI EleutherAI () is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI, was formed in a Discord server in July 2020 by Connor Leahy, Sid Black, and Leo Gao to organize a rep ...

announced the mapping between model sizes and API names in May 2021. These model sizes were later confirmed by OpenAI, but the sizes of subsequent models have not been disclosed.

GPT-3.5

Generative Pre-trained Transformer 3.5 (GPT-3.5) is a sub class of GPT-3 Models created by

in 2022. On March 15, 2022, OpenAI made available new versions of GPT-3 and

Codex The codex (: codices ) was the historical ancestor format of the modern book. Technically, the vast majority of modern books use the codex format of a stack of pages bound at one edge, along the side of the text. But the term ''codex'' is now r ...

in its API with edit and insert capabilities under the names "text-davinci-002" and "code-davinci-002". These models were described as more capable than previous versions and were trained on data up to June 2021. On November 28, 2022, OpenAI introduced text-davinci-003. On November 30, 2022, OpenAI began referring to these models as belonging to the "GPT-3.5" series, and released

ChatGPT ChatGPT is a generative artificial intelligence chatbot developed by OpenAI and released on November 30, 2022. It uses large language models (LLMs) such as GPT-4o as well as other Multimodal learning, multimodal models to create human-like re ...

, which was fine-tuned from a model in the GPT-3.5 series. OpenAI does not include GPT-3.5 in GPT-3.

Models

There are three models: * Chat ** gpt-3.5-turbo * Text completion ** text-davinci-003 ** text-davinci-002

GPT-3.5 with browsing

On April 10, 2023,

introduced a new variant of its GPT-3.5 series model, known as GPT-3.5 with Browsing (ALPHA). This updated model was described to build upon the capabilities of its predecessors "text-davinci-002" and "code-davinci-002". The GPT-3.5 with Browsing (ALPHA) model incorporated the ability to access and browse online information. This has led to more accurate and up-to-date responses to user queries. The GPT-3.5 with Browsing (ALPHA) model has been trained on data up to September 2021, giving it more information compared to previous GPT-3.5 models, which were trained on data up until June 2021. The model attempted to provide developers and users with an advanced natural language processing tool that can effectively retrieve and synthesize online information. To enable browsing capabilities, OpenAI implemented a new

that allows the GPT-3.5 with Browsing (ALPHA) model to access selected online resources during operation. This feature allows users to ask questions or request information with the expectation that the model will deliver updated, accurate, and relevant answers based on the latest online sources available to it. On April 27, 2023, OpenAI made the GPT-3.5 with Browsing (ALPHA) model publicly available to GPT Plus users. This allowed more people to access to its new features.

InstructGPT

InstructGPT is a fine-tuned version of GPT-3.5 trained on a dataset of human-written instructions.

Reception

Applications

* GPT-3, specifically the Codex model, was the basis for

GitHub Copilot GitHub Copilot is a code completion and automatic programming tool developed by GitHub and OpenAI that assists users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocomplete, autocom ...

, a code completion and generation software that can be used in various code editors and IDEs. * GPT-3 is used in certain

products to translate conventional language into formal computer code. * GPT-3 has been used in CodexDB to generate query-specific code for

SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...

processing. * GPT-3 has been used by

Jason Rohrer Jason Rohrer (born November 14, 1977) is an American computer programmer, writer, musician, and game designer. He publishes most of his software into the public domain (public-domain software) and charges for versions of his games distributed ...

in a retro-themed chatbot project named "Project December", which is accessible online and allows users to converse with several AIs using GPT-3 technology. * GPT-3 was used by ''

The Guardian ''The Guardian'' is a British daily newspaper. It was founded in Manchester in 1821 as ''The Manchester Guardian'' and changed its name in 1959, followed by a move to London. Along with its sister paper, ''The Guardian Weekly'', ''The Guardi ...

'' to write an article about AI being harmless to human beings. It was fed some ideas and produced eight different essays, which were ultimately merged into one article. * GPT-3 was used in '' AI Dungeon'', which generates text-based adventure games. Later it was replaced by a competing model after OpenAI changed their policy regarding generated content. * GPT-3 is used to aid in writing copy and other marketing materials. * A 2022 study from

Drexel University Drexel University is a private university, private research university with its main campus in Philadelphia, Pennsylvania, United States. Drexel's undergraduate school was founded in 1891 by Anthony Joseph Drexel, Anthony J. Drexel, a financier ...

suggested that GPT-3-based systems could be used to screen for early signs of

Alzheimer's disease Alzheimer's disease (AD) is a neurodegenerative disease and the cause of 60–70% of cases of dementia. The most common early symptom is difficulty in remembering recent events. As the disease advances, symptoms can include problems wit ...

Reviews

* In a July 2020 review in ''

The New York Times ''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...

'', Farhad Manjoo said that GPT-3's ability to generate computer code, poetry, and prose is not just "amazing", "spooky", and "humbling", but also "more than a little terrifying". * ''Daily Nous'' presented a series of articles by nine philosophers on GPT-3. Australian philosopher

David Chalmers David John Chalmers (; born 20 April 1966) is an Australian philosopher and cognitive scientist, specializing in philosophy of mind and philosophy of language. He is a professor of philosophy and neural science at New York University, as well ...

described GPT-3 as "one of the most interesting and important AI systems ever produced". * A review in ''

Wired Wired may refer to: Arts, entertainment, and media Music * ''Wired'' (Jeff Beck album), 1976 * ''Wired'' (Hugh Cornwell album), 1993 * ''Wired'' (Mallory Knox album), 2017 * "Wired", a song by Prism from their album '' Beat Street'' * "Wired ...

'' said that GPT-3 was "provoking chills across

Silicon Valley Silicon Valley is a region in Northern California that is a global center for high technology and innovation. Located in the southern part of the San Francisco Bay Area, it corresponds roughly to the geographical area of the Santa Clara Valley ...

". * The ''

National Law Review ''The National Law Review'' is an American law journal, daily legal news website and legal analysis content-aggregating database. In 2020 and 2021, ''The National Law Review'' published over 20,000 legal news articles and experienced an uptick ...

'' said that GPT-3 is an "impressive step in the larger process", with OpenAI and others finding "useful applications for all of this power" while continuing to "work toward a more

general intelligence The ''g'' factor is a construct developed in psychometric investigations of cognitive abilities and human intelligence. It is a variable that summarizes positive correlations among different cognitive tasks, reflecting the assertion that an indi ...

". * An article in the ''

MIT Technology Review ''MIT Technology Review'' is a bimonthly magazine wholly owned by the Massachusetts Institute of Technology. It was founded in 1899 as ''The Technology Review'', and was re-launched without "''The''" in its name on April 23, 1998, under then pu ...

,'' co-written by Deep Learning critic

Gary Marcus Gary Fred Marcus (born 1970) is an American psychologist, cognitive scientist, and author, known for his research on the intersection of cognitive psychology, neuroscience, and artificial intelligence (AI). Marcus is professor ''emeritus'' of ps ...

, stated that GPT-3's "comprehension of the world is often seriously off, which means you can never really trust what it says." According to the authors, GPT-3 models relationships between words without having an

understanding Understanding is a cognitive process related to an abstract or physical object, such as a person, situation, or message whereby one is able to use concepts to model that object. Understanding is a relation between the knower and an object of u ...

of the meaning behind each word. * Jerome Pesenti, head of the Facebook AI lab, said GPT-3 is "unsafe," pointing to the

sexist Sexism is prejudice or discrimination based on one's sex or gender. Sexism can affect anyone, but primarily affects women and girls. It has been linked to gender roles and stereotypes, and may include the belief that one sex or gender is int ...

racist Racism is the belief that groups of humans possess different behavioral traits corresponding to inherited attributes and can be divided based on the superiority of one Race (human categorization), race or ethnicity over another. It may also me ...

and other biased and negative language generated by the system when it was asked to discuss Jews, women, black people, and the

Holocaust The Holocaust (), known in Hebrew language, Hebrew as the (), was the genocide of History of the Jews in Europe, European Jews during World War II. From 1941 to 1945, Nazi Germany and Collaboration with Nazi Germany and Fascist Italy ...

. * Nabla, a French start-up specializing in healthcare technology, tested GPT-3 as a medical

chatbot A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...

, though OpenAI itself warned against such use. As expected, GPT-3 showed several limitations. For example, while testing GPT-3 responses about mental health issues, the AI advised a simulated patient to commit suicide. *

Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...

expressed his skepticism about GPT-3's scientific value: "It's not a language model. It works just as well for impossible languages as for actual languages. It is therefore refuted, if intended as a language model, by normal scientific criteria. ..Perhaps it's useful for some purpose, but it seems to tell us nothing about language or cognition generally." *

Luciano Floridi Luciano Floridi (; born 16 November 1964) is an Italian and British philosopher. He is the director of the Digital Ethics Center at Yale University. He is also a Professor of Sociology of Culture and Communication at the University of Bologna ...

and Massimo Chiriatti highlighted the risk of "cheap production of good, semantic artefacts". * OpenAI's Sam Altman himself criticized what he called "GPT-3 hype", acknowledging GPT-3 "has serious weakness and sometimes makes very silly mistakes... AI is going to change the world, but GPT-3 is just a very early glimpse."

Criticism

GPT-3's builder,

, was initially founded as a

non-profit A nonprofit organization (NPO), also known as a nonbusiness entity, nonprofit institution, not-for-profit organization, or simply a nonprofit, is a non-governmental (private) legal entity organized and operated for a collective, public, or so ...

in 2015. In 2019, OpenAI broke from its usual open-source standards by not publicly releasing GPT-3's predecessor model, citing concerns that the model could facilitate the propagation of fake news. OpenAI eventually released a version of

that was 8% of the original model's size. In the same year, OpenAI restructured to be a for-profit company. In 2020, Microsoft announced the company had exclusive licensing of GPT-3 for Microsoft's products and services following a multi-billion dollar investment in OpenAI. The agreement permits OpenAI to offer a public-facing API such that users can send text to GPT-3 to receive the model's output, but only Microsoft will have access to GPT-3's source code. Large language models, such as GPT-3, have come under criticism from a few of Google's AI ethics researchers for the environmental impact of training and storing the models, detailed in a paper co-authored by

Timnit Gebru Timnit Gebru (Amharic and ; 1982/1983) is an Eritrean Ethiopian-born computer scientist who works in the fields of artificial intelligence (AI), algorithmic bias and data mining. She is a co-founder of Black in AI, an advocacy group that has pu ...

and

Emily M. Bender Emily Menon Bender (born 1973) is an American linguist and professor at the University of Washington where she directs its Computational Linguistics Laboratory. She specializes in computational linguistics and natural language processing. She has ...

in 2021. The growing use of automated writing technologies based on GPT-3 and other language generators, has raised concerns regarding academic integrity and raised the stakes of how universities and schools will gauge what constitutes academic misconduct such as plagiarism. OpenAI's GPT series was built with data from the

dataset, a conglomerate of copyrighted articles, internet posts, web pages, and books scraped from 60 million domains over a period of 12 years. ''TechCrunch'' reports this training data includes copyrighted material from the BBC, ''The New York Times'',

Reddit Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...

, the full text of online books, and more. In its response to a 2019 Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation from the

United States Patent and Trademark Office The United States Patent and Trademark Office (USPTO) is an List of federal agencies in the United States, agency in the United States Department of Commerce, U.S. Department of Commerce that serves as the national patent office and trademark ...

(USPTO), OpenAI argued that "Under current law, training AI systems uch as its GPT modelsconstitutes

fair use Fair use is a Legal doctrine, doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to bal ...

," but that "given the lack of

case law Case law, also used interchangeably with common law, is a law that is based on precedents, that is the judicial decisions from previous cases, rather than law based on constitutions, statutes, or regulations. Case law uses the detailed facts of ...

on point, OpenAI and other AI developers like us face substantial legal uncertainty and compliance costs."

References

{{Existential risk from artificial intelligence Large language models Generative pre-trained transformers OpenAI