Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model created by

OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...

and the fourth in its GPT series. It was released on March 14, 2023, and has been made publicly available in a limited form via

ChatGPT ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...

Plus, with access to its commercial

API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...

being provided via a waitlist. As a

transformer A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer' ...

, GPT-4 was pretrained to predict the next token (using both public data and "data licensed from third-party providers"), and was then fine-tuned with reinforcement learning from human and AI feedback for human alignment and policy compliance. Observers reported the GPT-4 based version of ChatGPT to be an improvement on the previous (GPT-3.5 based) ChatGPT, with the caveat that GPT-4 retains some of the same problems. Unlike the predecessors, GPT-4 can take images as well as text as input. OpenAI has declined to reveal technical information such as the size of the GPT-4 model.

Background

OpenAI published their first paper on GPT in 2018, called "Improving Language Understanding by Generative Pre-Training." They also released GPT-1, a model based on the Transformer architecture that was trained on a large corpus of books. The next year, they introduced GPT-2, a larger model that could generate coherent text. In 2020, they introduced GPT-3, a model with 100 times the number of parameters as GPT-2, that could perform various tasks with few examples. GPT-3 was further improved into GPT-3.5, which was used to create

Capabilities

OpenAI stated that GPT-4 is "more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5." They produced two versions of GPT-4, with context windows of 8,192 and 32,768 tokens, a significant improvement over GPT-3.5 and GPT-3, which were limited to 4,096 and 2,049 tokens respectively. Unlike its predecessors, GPT-4 is a multimodal model: it can take images as well as text as input; this gives it the ability to describe the humor in unusual images, summarize screen-shot text, and answer exam questions that contain diagrams. To gain further control over GPT-4, OpenAI introduced the "system message", a directive in natural language given to GPT-4 in order to specify its tone of voice and task. For example, the system message can instruct the model to "be a Shakespearean pirate", in which case it will respond in rhyming, Shakespearean prose, or request it to "always write the output of tsresponse in

JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...

", in which case the model will do so, adding keys and values as it sees fit to match the structure of its reply. In the examples provided by OpenAI, GPT-4 refused to deviate from its system message despite requests to do otherwise by the user during the conversation.

Aptitude on standardized tests

GPT-4 demonstrates aptitude on several standardized tests. OpenAI claims that in their own testing the model received a score of 1410 on the

SAT The SAT ( ) is a standardized test widely used for college admissions in the United States. Since its debut in 1926, its name and scoring have changed several times; originally called the Scholastic Aptitude Test, it was later called the Schol ...

(94th percentile), 163 on the

LSAT The Law School Admission Test (LSAT; ) is a standardized test administered by the Law School Admission Council (LSAC) for prospective law school candidates. It is designed to assess reading comprehension as well as logical and verbal rea ...

(88th percentile), and 298 on the

Uniform Bar Exam In the United States, those seeking to become lawyers must normally pass a bar examination before they can be admitted to the bar and become licensed to practice law. Bar exams are administered by states or territories, generally by agencies under ...

(90th percentile). In contrast, OpenAI claims that GPT-3.5 received scores for the same exams in the 82nd, 40th, and 10th percentiles respectively.

Medical knowledge

Researchers from Microsoft tested GPT-4 on medical problems and found "that GPT-4, without any specialized prompt crafting, exceeds the passing score on

USMLE The United States Medical Licensing Examination (USMLE) is a three-step examination program for medical licensure in the United States sponsored by the Federation of State Medical Boards (FSMB) and the National Board of Medical Examiners (NBME). Ph ...

by over 20 points and outperforms earlier general-purpose models (GPT-3.5) as well as models specifically fine-tuned on medical knowledge ( Med-PaLM, a prompt-tuned version of Flan-PaLM 540B)".

Training

OpenAI did not release the technical details of GPT-4; the technical report explicitly refrained from specifying the model size, architecture, or hardware used during either training or

inference Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in ...

. While the report described that the model was trained using a combination of first

supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...

on a large

dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...

, then reinforcement learning using both human and AI feedback, it did not provide details of the training, including the process by which the training dataset was constructed, the computing power required, or any

hyperparameters In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to mo ...

such as the

learning rate In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly ac ...

, epoch count, or optimizer(s) used. The report claimed that "the competitive landscape and the safety implications of large-scale models" were factors that influenced this decision.

Alignment

According to their report, OpenAI conducted internal adversarial testing on and before GPT-4's launch date with dedicated red teams composed of researchers and industry professionals to mitigate potential vulnerabilities. As part of these efforts, they granted the Alignment Research Center (ARC) early access to the models to assess power-seeking risks. As an example, ARC found that the model was able to encourage a TaskRabbit

crowdsource Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...

worker to solve a

CAPTCHA A CAPTCHA ( , a contrived acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge–response test used in computing to determine whether the user is human. The term was coined in 2003 ...

for it, but was not able to autonomously replicate or acquire resources. In order to properly refuse harmful prompts, outputs from GPT-4 were tweaked using the model itself as a tool. A GPT-4 classifier serving as a rule-based reward model (RBRM) would take prompts, the corresponding output from the GPT-4 policy model, and a human-written set of rules to classify the output according to the rubric. GPT-4 was then rewarded for refusing to respond to harmful prompts as classified by the RBRM.

Reception

U.S. Representatives

Don Beyer Donald Sternoff Beyer Jr. (; born June 20, 1950) is an American businessman, diplomat, and politician serving as the U.S. representative for since 2015. A member of the Democratic Party, his district is in the heart of Northern Virginia and incl ...

and

Ted Lieu Ted W. Lieu (; born March 29, 1969) is an American politician and Air Force Reserve Command colonel who has represented California's 33rd congressional district in the U.S. House of Representatives since 2015. The district includes much of wes ...

confirmed to the

New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid d ...

that

Sam Altman Samuel H. Altman ( ; born April 22, 1985) is an American entrepreneur, investor, programmer, and blogger. He is the CEO of OpenAI and the former president of Y Combinator. Early life and education Altman grew up in St. Louis, Missouri; his mothe ...

, CEO of OpenAI, visited

Congress A congress is a formal meeting of the representatives of different countries, constituent states, organizations, trade unions, political parties, or other groups. The term originated in Late Middle English to denote an encounter (meeting of a ...

in January 2023 to demonstrate GPT-4 and its improved "security controls" compared to other AI models. According to '' Vox'', GPT-4 "impressed observers with its markedly improved performance across reasoning, retention, and coding." ''

Mashable Mashable is a digital media platform, news website and entertainment company founded by Pete Cashmore in 2005. History Mashable was founded by Pete Cashmore while living in Aberdeen, Scotland, in July 2005. Early iterations of the site were a ...

'' agreed that GPT-4 was usually a significant improvement, but also judged that GPT-3 would occasionally give better answers in a side-by-side comparison. Microsoft Research tested the model behind GPT-4 and concluded that "it could reasonably be viewed as an early (yet still incomplete) version of an

artificial general intelligence Artificial general intelligence (AGI) is the ability of an intelligent agent to understand or learn any intellectual task that a human being can. It is a primary goal of some artificial intelligence research and a common topic in science fictio ...

(AGI) system".

AI safety concerns

In late March 2023, an open letter from the

Future of Life Institute The Future of Life Institute (FLI) is a nonprofit organization that works to reduce global catastrophic and existential risks facing humanity, particularly existential risk from advanced artificial intelligence (AI). The Institute's work is mad ...

signed by various AI researchers and tech executives called for the pausing of all training of AIs stronger than GPT-4 for 6 months, citing AI safety concerns amid a race of progress in the field. The signatories, which included figures such as AI pioneer

Yoshua Bengio Yoshua Bengio (born March 5, 1964) is a Canadian computer scientist, most noted for his work on artificial neural networks and deep learning. He is a professor at the Department of Computer Science and Operations Research at the Université ...

Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...

co-founder

Steve Wozniak Stephen Gary Wozniak (; born August 11, 1950), also known by his nickname "Woz", is an American electronics engineer, computer programmer, philanthropist, inventor, and technology entrepreneur. In 1976, with business partner Steve Jobs, he c ...

, and Tesla CEO

Elon Musk Elon Reeve Musk ( ; born June 28, 1971) is a business magnate and investor. He is the founder, CEO and chief engineer of SpaceX; angel investor, CEO and product architect of Tesla, Inc.; owner and CEO of Twitter, Inc.; founder of The Bori ...

, expressed concern about both near-term and existential risks of AI development such as a potential AI singularity. OpenAI CEO Sam Altman did not sign the letter, arguing that OpenAI already prioritizes safety.

Criticisms

While OpenAI released both the weights of the neural network and the technical details of GPT-2, and, although not releasing the weights, did release the technical details of GPT-3, OpenAI did not reveal either the weights or the technical details of GPT-4. This decision has been criticized by other AI researchers, who argue that it hinders open research into GPT-4's biases and safety. Sasha Luccioni, a research scientist at HuggingFace, argued that the model was a "dead end" for the scientific community due to its closed nature, which prevents others from building upon GPT-4's improvements. HuggingFace co-founder Thomas Wolf argued that with GPT-4, "OpenAI is now a fully closed company with scientific communication akin to press releases for products". Like its predecessor, GPT-4 has been known to "hallucinate". The model has also been criticized for generating hateful, biased, and racist information.

Usage

ChatGPT Plus

Plus is a GPT-4 backed version of ChatGPT available for a 20 USD per month subscription fee (the original version is backed by GPT-3.5). OpenAI also makes GPT-4 available to a select group of applicants through their GPT-4 API waitlist; after being accepted, an additional fee of 0.03 USD per 1000 tokens in the initial text provided to the model ("prompt"), and 0.06 USD per 1000 tokens that the model generates ("completion"), is required to use the version of the model with a 8192-token context window; for the 32768-token version, those prices are doubled.

Duolingo

Duolingo Duolingo ( ) is an American educational technology company which produces learning apps and provides language certification. On its main app, users can practice vocabulary, grammar, pronunciation and listening skills using spaced repetition. D ...

integrated GPT-4 in their application through two new features, "Roleplay" and "Explain My Answer". The first version of this update is aimed only at English speakers who are learning French or Spanish, with plans to extend the features to other languages in the future.

Miðeind ehf

Iceland Iceland ( is, Ísland; ) is a Nordic island country in the North Atlantic Ocean and in the Arctic Ocean. Iceland is the most sparsely populated country in Europe. Iceland's capital and largest city is Reykjavík, which (along with its s ...

ic start-up Miðeind ehf, which works on

language preservation Language preservation is the preservation of endangered or dead languages. With language death, studies in linguistics, anthropology, prehistory and psychology lose diversity. As history is remembered with the help of historic preservation, languag ...

, was selected by OpenAI as one of six companies to participate in an early beta test program of the new model.

Khan Academy

Khan Academy Khan Academy is an American non-profit educational organization created in 2008 by Sal Khan. Its goal is creating a set of online tools that help educate students. The organization produces short lessons in the form of videos. Its website also in ...

uses GPT-4 to create a tutoring chatbot, which the organization names "Khanmigo". While it is in the "research phase", access to the chatbot is provided free to the students and teachers of 500 school districts who have "partnered" with Khan Academy. Public access is only offered to a limited number of users selected from a waitlist; after acceptance, a 20 USD per month fee is required to use the technology.

Be My Eyes

Be My Eyes, which helps visually impaired people to identify objects and navigate their surroundings, was the first app to incorporate GPT-4's image recognition capabilities, through a new "Virtual Volunteer" feature. The feature is an alternative to relying on human volunteers for the same tasks. The ''Be My Eyes'' "Virtual Volunteer" is in beta testing.

GitHub Copilot

GitHub Copilot GitHub Copilot is a cloud-based artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently ...

announced a GPT-4 powered assistant named "Copilot X". The product provides another chat-style interface to GPT-4, allowing the programmer to receive answers to questions like "how do I vertically center a

div Div or DIV may refer to: Science and technology * Division (mathematics), the mathematical operation that is the inverse of multiplication * Span and div, HTML tags that implement generic elements * div, a C mathematical function * Divergence, ...

?". A feature termed "context-aware conversations" allows the user to highlight a portion of code within

Visual Studio Code Visual Studio Code, also commonly referred to as VS Code, is a source-code editor made by Microsoft with the Electron Framework, for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code complet ...

and direct GPT-4 to perform actions on it, such as the writing of unit tests. Another feature allows summaries, or "code walkthroughs", to be autogenerated by GPT-4 for pull requests submitted to GitHub. Copilot X also provides terminal integration, which allows the user to ask GPT-4 to generate shell commands based on natural language requests. , while GitHub provides access to a limited number of people selected through a waitlist, the release date as well as the cost of the product are still to be announced.

Microsoft Bing

Microsoft 365 Copilot

On 17 March 2023, Microsoft announced further integration of GPT-4 into its products, revealing Microsoft 365 Copilot, "embedded in the apps millions of people use everyday:

Word A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...

Excel ExCeL London (an abbreviation for Exhibition Centre London) is an exhibition centre, international convention centre and former hospital in the Custom House area of Newham, East London. It is situated on a site on the northern quay of the ...

PowerPoint Microsoft PowerPoint is a presentation program, created by Robert Gaskins and Dennis Austin at a software company named Forethought, Inc. It was released on April 20, 1987, initially for Macintosh computers only. Microsoft acquired PowerPoi ...

Outlook Outlook or The Outlook may refer to: Computing * Microsoft Outlook, an e-mail and personal information management software product from Microsoft * Outlook.com, a web mail service from Microsoft * Outlook on the web, a suite of web applications ...

, Teams, and more".

Stripe

Stripe utilizes GPT-4 to help with fraud detection, and to try to improve other aspects of the user experience.

Potential applications

Multimodal AI models such as GPT-4 may offer benefits for

personalized medicine Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on the ...

(tailoring treatments and interventions to individual patients based on their unique genetic and environmental factors) as well as remote healthcare by being able to act as virtual health assistants, or by helping to identify the most effective approaches. They may also help in the areas of digital clinical trials, pandemic surveillance, and digital twin technology.

References

{{Differentiable computing OpenAI Large language models 2023 software