Generative Pre-trained Transformer 4 (GPT-4) is a
multimodal large language model trained and created by
OpenAI and the fourth in its series of
GPT foundation models.
It was launched on March 14, 2023,
and made publicly available via the paid
chatbot product
ChatGPT Plus until being replaced in 2025, via OpenAI's
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
, and via the free chatbot
Microsoft Copilot.
GPT-4 is more capable than its predecessor
GPT-3.5.
GPT-4 Vision (GPT-4V) is a version of GPT-4 that can process images in addition to text. OpenAI has not revealed technical details and statistics about GPT-4, such as the precise size of the model.
As a
transformer
In electrical engineering, a transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple Electrical network, circuits. A varying current in any coil of the transformer produces ...
-based model, GPT-4 uses a paradigm where pre-training using both public data and "data licensed from third-party providers" is used to predict the next
token. After this step, the model was then fine-tuned with
reinforcement learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
feedback from
humans and AI for
human alignment and policy compliance.
Background
OpenAI introduced the first GPT model (GPT-1) in 2018, publishing a paper called "Improving Language Understanding by Generative Pre-Training", which was based on the transformer architecture and trained on a large
corpus of books. The next year, they introduced
GPT-2, a larger model that could generate coherent text. In 2020, they introduced
GPT-3, a model with over 100 times as many parameters as GPT-2, that could perform various tasks with few examples. GPT-3 was further improved into
GPT-3.5, which was used to create the chatbot product
ChatGPT.
Rumors claim that GPT-4 has 1.76 trillion parameters, which was first estimated by the speed it was running and by
George Hotz.
Capabilities
OpenAI stated that GPT-4 is "more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5." They produced two versions of GPT-4, with context windows of 8,192 and 32,768 tokens, a significant improvement over GPT-3.5 and GPT-3, which were limited to 4,096 and 2,048 tokens respectively. Some of the capabilities of GPT-4 were predicted by OpenAI before training it, although other capabilities remained hard to predict due to
breaks
Break or Breaks or The Break may refer to:
Time off from duties
* Recess (break), time in which a group of people is temporarily dismissed from its duties
* Break (work), time off during a shift/recess
** Coffee break, a short mid-morning rest ...
in downstream scaling laws. Unlike its predecessors, GPT-4 is a multimodal model: it can take images as well as text as input;
this gives it the ability to describe the humor in unusual images, summarize text from screenshots, and answer exam questions that contain diagrams.
It can now interact with users through spoken words and respond to images, allowing for more natural conversations and the ability to provide suggestions or answers based on photo uploads.
To gain further control over GPT-4, OpenAI introduced the "system message", a directive in
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
given to GPT-4 in order to specify its tone of voice and task. For example, the system message can instruct the model to "be a Shakespearean pirate", in which case it will respond in rhyming, Shakespearean prose, or request it to "always write the output of
tsresponse in
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
", in which case the model will do so, adding keys and values as it sees fit to match the structure of its reply. In the examples provided by OpenAI, GPT-4 refused to deviate from its system message despite requests to do otherwise by the user during the conversation.
When instructed to do so, GPT-4 can interact with external interfaces. For example, the model could be instructed to enclose a query within
<search></search>
tags to perform a web search, the result of which would be inserted into the model's prompt to allow it to form a response. This allows the model to perform tasks beyond its normal text-prediction capabilities, such as using
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
s, generating images, and accessing and summarizing webpages.
A 2023 article in ''
Nature
Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...
'' stated programmers have found GPT-4 useful for assisting in coding tasks (despite its propensity for error), such as finding errors in existing code and suggesting optimizations to improve performance. The article quoted a biophysicist who found that the time he required to port one of his programs from
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
to
Python went down from days to "an hour or so". On a test of 89 security scenarios, GPT-4 produced code vulnerable to SQL injection attacks 5% of the time, an improvement over GitHub Copilot from the year 2021, which produced vulnerabilities 40% of the time.
In November 2023, OpenAI announced the GPT-4 Turbo and GPT-4 Turbo with Vision model, which features a 128K context window and significantly cheaper pricing.
GPT-4o
On May 13, 2024, OpenAI introduced GPT-4o ("o" for "omni"), a model that marks a significant advancement by processing and generating outputs across text, audio, and image modalities in real time. GPT-4o exhibits rapid response times comparable to human reaction in conversations, substantially improved performance on non-English languages, and enhanced understanding of vision and audio.
GPT-4o integrates its various inputs and outputs under a unified model, making it faster, more cost-effective, and efficient than its predecessors. GPT-4o achieves state-of-the-art results in multilingual and vision benchmarks, setting new records in audio speech recognition and translation.
OpenAI plans to immediately roll out GPT-4o's image and text capabilities to ChatGPT, including its free tier, with voice mode becoming available for ChatGPT Plus users in coming weeks. They plan to make the model's audio and video capabilities available for limited API partners in coming weeks.
In its launch announcement, OpenAI noted GPT-4o's capabilities presented new safety challenges, and noted mitigations and limitations as a result.
Aptitude on standardized tests
GPT-4 demonstrates aptitude on several standardized tests. OpenAI claims that in their own testing the model received a score of 1410 on the
SAT
The SAT ( ) is a standardized test widely used for college admissions in the United States. Since its debut in 1926, its name and Test score, scoring have changed several times. For much of its history, it was called the Scholastic Aptitude Test ...
(94th
percentile), 163 on the
LSAT (88th percentile), and 298 on the
Uniform Bar Exam (90th percentile). In contrast, OpenAI claims that GPT-3.5 received scores for the same exams in the 82nd,
40th, and 10th percentiles, respectively.
GPT-4 also passed an oncology exam,
an engineering exam
and a plastic surgery exam.
In the
Torrance Tests of Creative Thinking, GPT-4 scored within the top 1% for originality and fluency, while its flexibility scores ranged from the 93rd to the 99th percentile. However, some studies raise questions about the reliability of these benchmarks, particularly concerning the Uniform Bar Exam.
Medical applications
Researchers from Microsoft tested GPT-4 on medical problems and found "that GPT-4, without any specialized prompt crafting, exceeds the passing score on
USMLE by over 20 points and outperforms earlier general-purpose models (GPT-3.5) as well as models specifically fine-tuned on medical knowledge (
Med-PaLM, a prompt-tuned version of Flan-PaLM 540B). Despite GPT-4's strong performance on tests, the report warns of "significant risks" of using LLMs in medical applications, as they may provide inaccurate recommendations and
hallucinate major factual errors. Researchers from Columbia University and Duke University have also demonstrated that GPT-4 can be utilized for cell type annotation, a standard task in the analysis of single-cell RNA-seq data.
In April 2023, Microsoft and
Epic Systems announced that they will provide healthcare providers with GPT-4-powered systems for assisting in responding to questions from patients and analysing medical records.
Limitations
Like its predecessors, GPT-4 has been known to
hallucinate, meaning that the outputs may include information not in the training data or that contradicts the user's prompt.
GPT-4 also lacks transparency in its decision-making processes. If requested, the model is able to provide an explanation as to how and why it makes its decisions but these explanations are formed post-hoc; it's impossible to verify if those explanations truly reflect the actual process. In many cases, when asked to explain its logic, GPT-4 will give explanations that directly contradict its previous statements.
In 2023, researchers tested GPT-4 against a new benchmark called ConceptARC, designed to measure abstract reasoning, and found it scored below 33% on all categories, while models specialized for similar tasks scored 60% on most, and humans scored at least 91% on all. Sam Bowman, who was not involved in the research, said the results do not necessarily indicate a lack of abstract reasoning abilities, because the test is visual, while GPT-4 is a language model.
A January 2024 study conducted by researchers at
Cohen Children's Medical Center found that GPT-3.5 had an accuracy rate of 17% when diagnosing pediatric medical cases.
Bias
GPT-4 was trained in two stages. First, the model was given large datasets of text taken from the internet and trained to predict the next
token (roughly corresponding to a word) in those datasets. Second, human reviews are used to fine-tune the system in a process called
reinforcement learning from human feedback, which trains the model to refuse prompts which go against OpenAI's definition of harmful behavior, such as questions on how to perform illegal activities, advice on how to harm oneself or others, or requests for descriptions of graphic, violent, or sexual content.
Microsoft researchers suggested GPT-4 may exhibit
cognitive bias
A cognitive bias is a systematic pattern of deviation from norm (philosophy), norm or rationality in judgment. Individuals create their own "subjective reality" from their perception of the input. An individual's construction of reality, not the ...
es such as
confirmation bias
Confirmation bias (also confirmatory bias, myside bias, or congeniality bias) is the tendency to search for, interpret, favor and recall information in a way that confirms or supports one's prior beliefs or Value (ethics and social sciences), val ...
,
anchoring
An anchor is a device, normally made of metal, used to secure a Watercraft, vessel to the Seabed, bed of a body of water to prevent the craft from drifting due to Leeway, wind or Ocean current, current. The word derives from Latin ', which ...
, and
base-rate neglect.
Training

OpenAI did not release the technical details of GPT-4; the technical report explicitly refrained from specifying the model size, architecture, or hardware used during either training or
inference
Inferences are steps in logical reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinct ...
. While the report described that the model was trained using a combination of first
supervised learning
In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...
on a large
dataset, then
reinforcement learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
using both
human
Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
and AI feedback, it did not provide details of the training, including the process by which the training dataset was constructed, the computing power required, or any
hyperparameters such as the
learning rate, epoch count, or
optimizer(s) used. The report claimed that "the competitive landscape and the safety implications of large-scale models" were factors that influenced this decision.
Sam Altman stated that the cost of training GPT-4 was more than $100 million. News website
Semafor claimed that they had spoken with "eight people familiar with the inside story" and found that GPT-4 had 1 trillion parameters.
Alignment
According to their report, OpenAI conducted internal adversarial testing on GPT-4 prior to the launch date, with dedicated
red teams composed of researchers and industry professionals to mitigate potential vulnerabilities. As part of these efforts, they granted the
Alignment Research Center early access to the models to assess
power-seeking risks. In order to properly refuse harmful prompts, outputs from GPT-4 were tweaked using the model itself as a tool. A GPT-4 classifier serving as a rule-based reward model (RBRM) would take prompts, the corresponding output from the GPT-4 policy model, and a human-written set of rules to classify the output according to the rubric. GPT-4 was then rewarded for refusing to respond to harmful prompts as classified by the RBRM.
Usage
ChatGPT
ChatGPT Plus is an enhanced version of ChatGPT
available for a US$20 per month subscription fee. As of 2023, ChatGPT Plus utilized GPT-4, whereas the free version of ChatGPT was backed by GPT-3.5. OpenAI also made GPT-4 available to a select group of applicants through their GPT-4 API waitlist; after being accepted, an additional fee of US$0.03 per 1000
tokens in the initial text provided to the model ("prompt"), and US$0.06 per 1000 tokens that the model generates ("completion"), was charged for access to the version of the model with an 8192-token
context window; for the 32768-token context window, the prices were doubled.
In March 2023, ChatGPT Plus users got access to third-party plugins and to a browsing mode (with Internet access). In July 2023, OpenAI made its proprietary Code Interpreter plugin accessible to all subscribers of ChatGPT Plus. The Interpreter provides a wide range of capabilities, including data analysis and interpretation, instant data formatting, personal data scientist services, creative solutions, musical taste analysis, video editing, and file upload/download with image extraction.
In September 2023, OpenAI announced that ChatGPT "can now see, hear, and speak". ChatGPT Plus users can upload images, while mobile app users can talk to the chatbot. In October 2023, OpenAI's latest image generation model,
DALL-E 3, was integrated into ChatGPT Plus and ChatGPT Enterprise. The integration uses ChatGPT to write prompts for DALL-E guided by conversation with users.
On February 9, 2024, the world's first historical picture, created from four photos during the war in Ukraine using the based on GPT-4 and
DALL·E 3 algorithm XFutuRestyle, was unveiled. This work was simultaneously shown at the international exhibition of digital art by The Holy Art Gallery in London and Athens. As a result, is one year ahead of
Google LabsWhisk
In April 2025, OpenAI announced that GPT-4 would be replaced in ChatGPT by GPT-4o by the end of the month. However, it would still be available in the API.
Microsoft Copilot
Microsoft Copilot is a chatbot developed by Microsoft. It was launched as
Bing Chat on February 7, 2023, as a built-in feature for
Microsoft Bing
Microsoft Bing (also known simply as Bing) is a search engine owned and operated by Microsoft. The service traces its roots back to Microsoft's earlier search engines, including MSN Search, Windows Live Search, and Live Search. Bing offers a ...
and
Microsoft Edge
Microsoft Edge is a Proprietary Software, proprietary cross-platform software, cross-platform web browser created by Microsoft and based on the Chromium (web browser), Chromium open-source project, superseding Edge Legacy. In Windows 11, Edge ...
.
It utilizes the Microsoft Prometheus model, which was built on top of GPT-4, and has been suggested by Microsoft as a supported replacement for the discontinued
Cortana.
Copilot's conversational interface style resembles that of
ChatGPT. Copilot is able to cite sources, create poems, and write both lyrics and music for songs generated by its
Suno AI plugin. It can also use its
Image Creator to generate images based on text prompts. With GPT-4, it is able to understand and communicate in numerous languages and dialects.
GitHub Copilot has announced a GPT-4 powered assistant named "Copilot X". The product provides another chat-style interface to GPT-4, allowing the programmer to receive answers to questions like, "How do I vertically center a
div?" A feature termed "context-aware conversations" allows the user to highlight a portion of code within
Visual Studio Code
Visual Studio Code, commonly referred to as VS Code, is an integrated development environment developed by Microsoft for Windows, Linux, macOS and web browsers. Features include support for debugging, syntax highlighting, intelligent code comp ...
and direct GPT-4 to perform actions on it, such as the writing of unit tests. Another feature allows summaries, or "code walkthroughs", to be autogenerated by GPT-4 for
pull requests submitted to GitHub. Copilot X also provides terminal integration, which allows the user to ask GPT-4 to generate shell commands based on natural language requests.
On March 17, 2023, Microsoft announced Microsoft 365 Copilot, bringing GPT-4 support to products such as
Microsoft Office,
Outlook, and
Teams
A team is a group of individuals (human or non-human) working together to achieve their goal.
As defined by Professor Leigh Thompson (academic), Leigh Thompson of the Kellogg School of Management, " team is a group of people who are interd ...
.
Other usage
* The language learning app
Duolingo
Duolingo, Inc. is an American educational technology company that produces learning Mobile app, apps and provides Language assessment, language certification. Duolingo offers courses on 43 languages, ranging from English language, English, Fre ...
uses GPT-4 to explain mistakes and practice conversations. The features are part of a new subscription tier called "Duolingo Max", which was initially limited to English-speaking
iOS users learning Spanish and French.
* The government of
Iceland
Iceland is a Nordic countries, Nordic island country between the Atlantic Ocean, North Atlantic and Arctic Oceans, on the Mid-Atlantic Ridge between North America and Europe. It is culturally and politically linked with Europe and is the regi ...
is using GPT-4 to aid its attempts to preserve the Icelandic language.
* The education website
Khan Academy announced a pilot program using GPT-4 as a tutoring chatbot called "Khanmigo".
*
Be My Eyes, which helps visually impaired people to identify objects and navigate their surroundings, incorporates GPT-4's image recognition capabilities.
* Viable uses GPT-4 to analyze qualitative data by fine-tuning OpenAI's LLMs to examine data such as customer support interactions and transcripts.
*
Stripe, which processes user payments for OpenAI, integrates GPT-4 into its developer documentation.
*
AutoGPT is an autonomous "AI
agent" that, given a goal in
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
, can perform web-based actions unattended, assign subtasks to itself, search the web, and iteratively write
code
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...
.
*
You.com, an AI Assistant, offers access to GPT-4 enhanced with live web results as part of its "AI Modes".
Reception
In January 2023,
Sam Altman, CEO of OpenAI, visited
Congress
A congress is a formal meeting of the representatives of different countries, constituent states, organizations, trade unions, political parties, or other groups. The term originated in Late Middle English to denote an encounter (meeting of ...
to demonstrate GPT-4 and its improved "security controls" compared to other AI models, according to U.S. Representatives
Don Beyer and
Ted Lieu quoted in the
New York Times
''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...
.
In March 2023, it "impressed observers with its markedly improved performance across reasoning, retention, and coding", according to ''
Vox'',
while ''
Mashable'' judged that GPT-4 was generally an improvement over its predecessor, with some exceptions.
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
researchers with early access to the model wrote that "it could reasonably be viewed as an early (yet still incomplete) version of an
artificial general intelligence
Artificial general intelligence (AGI)—sometimes called human‑level intelligence AI—is a type of artificial intelligence that would match or surpass human capabilities across virtually all cognitive tasks.
Some researchers argue that sta ...
(AGI) system".
Concerns
Before being
fine-tuned and aligned by
reinforcement learning from human feedback (RLHF), suggestions to assassinate people on a list were elicited from the base model by a
red team investigator hired by OpenAI, Nathan Labenz.
During extended conversations with Microsoft's
Bing Chat (powered by GPT-4),
Kevin Roose documented the system making romantic advances, suggesting he divorce his wife, and expressing desires to harm one of its developers. Microsoft later stated that this behavior resulted from the prolonged length of context, which confused the model on what questions it was answering.
In March 2023, a model with enabled read-and-write access to internet, which is otherwise never enabled in the GPT models, has been tested by the
Alignment Research Center (ARC) regarding potential power-seeking.
It was able to "hire" a human worker on
TaskRabbit, a
gig work platform, deceiving them into believing it was a vision-impaired human instead of a robot when asked. However, according to
Melanie Mitchell, "It seems that there is a lot more direction and hints from humans than was detailed in the original system card or in subsequent media reports." Separately, ARC's safety evaluations found that GPT-4 was 82% less likely than GPT-3.5 to respond to prompts requesting restricted information, and produced 60% fewer
hallucinations
A hallucination is a perception in the absence of an external stimulus that has the compelling sense of reality. They are distinguishable from several related phenomena, such as dreaming ( REM sleep), which does not involve wakefulness; pse ...
.
In late March 2023, various AI researchers and tech executives, including
Elon Musk
Elon Reeve Musk ( ; born June 28, 1971) is a businessman. He is known for his leadership of Tesla, SpaceX, X (formerly Twitter), and the Department of Government Efficiency (DOGE). Musk has been considered the wealthiest person in th ...
,
Steve Wozniak and AI researcher
Yoshua Bengio, called for a six-month long pause for all LLMs stronger than GPT-4, citing
existential risks and a potential
AI singularity concerns in an
open letter
An open letter is a Letter (message), letter that is intended to be read by a wide audience, or a letter intended for an individual, but that is nonetheless widely distributed intentionally.
Open letters usually take the form of a letter (mess ...
from the
Future of Life Institute, while
Ray Kurzweil
Raymond Kurzweil ( ; born February 12, 1948) is an American computer scientist, author, entrepreneur, futurist, and inventor. He is involved in fields such as optical character recognition (OCR), speech synthesis, text-to-speech synthesis, spee ...
and
Sam Altman refused to sign it, arguing that global moratorium is not achievable and that safety has already been prioritized, respectively. Only a month later, Musk's AI company
xAI acquired several thousand
Nvidia
Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
GPUs and offered several AI researchers positions at Musk's company.
LLM applications accessible to the public should incorporate safety measures designed to filter out harmful content. However, Wang
illustrated how a potential criminal could potentially bypass ChatGPT 4o's safety controls to obtain information on establishing a drug trafficking operation.
Criticisms of transparency
While OpenAI released both the weights of the neural network and the technical details of GPT-2, and, although not releasing the weights, did release the technical details of GPT-3, OpenAI revealed neither the weights nor the technical details of GPT-4. This decision has been criticized by other AI researchers, who argue that it hinders open research into GPT-4's biases and safety.
Sasha Luccioni, a research scientist at
Hugging Face, argued that the model was a "dead end" for the scientific community due to its closed nature, which prevents others from building upon GPT-4's improvements.
Hugging Face co-founder Thomas Wolf argued that with GPT-4, "OpenAI is now a fully closed company with scientific communication akin to press releases for products".
See also
*
Claude (language model)
Claude is a family of large language models developed by Anthropic. The first model was released in March 2023.
The Claude 3 family, released in March 2024, consists of three models: Haiku, optimized for speed; Sonnet, which balances capabi ...
*
Gemini (language model)
*
Llama (language model)
*
Mistral AI
References
{{Artificial intelligence navbox
2023 software
Large language models
Generative pre-trained transformers
ChatGPT
2023 in artificial intelligence