HOME

TheInfoList



OR:

OpenAI Codex is an
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
model developed by
OpenAI OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and its parent company, the non-profit OpenAI Inc. The company conducts research in the field of AI with the stated goal of promo ...
. It parses natural language and generates
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
in response. It powers
GitHub Copilot GitHub Copilot is a cloud-based artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently ...
, a programming
autocompletion Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing. In Android and iOS smartphones, this is called predictive text. In graphical user interfaces, users can typically press the tab ...
tool for select
IDEs Ides or IDES may refer to: Calendar dates * Ides (calendar), a day in the Roman calendar that fell roughly in the middle of the month. In March, May, July, and October it was the 15th day of the month; in other months it was the 13th. **Ides of Mar ...
, like
Visual Studio Code Visual Studio Code, also commonly referred to as VS Code, is a source-code editor made by Microsoft with the Electron Framework, for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code complet ...
and
Neovim Vim (;
"Vim is pronounced as one word, like Jim, not vi-ai-em. It's written with a capital, since it's a name, again like Jim."
. Codex is a descendant of OpenAI's
GPT-3 Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt. The architecture is a standard ...
model,
fine-tuned In theoretical physics, fine-tuning is the process in which parameters of a model must be adjusted very precisely in order to fit with certain observations. This had led to the discovery that the fundamental constants and quantities fall into suc ...
for use in programming applications. OpenAI released an
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
for Codex in
closed beta A software release life cycle is the sum of the stages of development and maturity for a piece of computer software ranging from its initial development to its eventual release, and including updated versions of the released version to help impro ...
. In March 2023, OpenAI shut down access to Codex. Due to public appeals from researchers OpenAI reversed course. The Codex model can still be used by researchers of the OpenAI Research Access Program.


Capabilities

Based on GPT-3, a
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
trained on text, Codex was additionally trained on 159 gigabytes of
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
code from 54 million
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
repositories. A typical use case of Codex is for a user to type a comment, such as "//compute the moving average of an array for a given window size", then use the AI to suggest a block of code that satisfies that comment prompt. OpenAI stated that Codex can complete approximately 37% of requests and is meant to make human programming faster rather than to replace it. According to OpenAI's blog, Codex excels most at "mapping... simple problems to existing code", which they describe as "probably the least fun part of programming". Jeremy Howard, co-founder of Fast.ai, stated that "
Codex The codex (plural codices ) was the historical ancestor of the modern book. Instead of being composed of sheets of paper, it used sheets of vellum, papyrus, or other materials. The term ''codex'' is often used for ancient manuscript books, with ...
is a way of getting code written without having to write as much code" and that "it is not always correct, but it is just close enough". According to a paper written by OpenAI researchers, when Codex attempted each test case 100 times, it generated working solutions for 70.2% of prompts. OpenAI claims that Codex can create code in over a dozen programming languages, including Go,
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
,
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
,
Shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses ** Thin-shell structure Science Biology * Seashell, a hard ou ...
,
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
, and
TypeScript TypeScript is a free and open source programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are ...
, though it is most effective in Python. According to ''
VentureBeat ''VentureBeat'' is an American technology website headquartered in San Francisco, California. It publishes news, analysis, long-form features, interviews, and videos. History The ''VentureBeat'' company was founded in 2006 by Matt Marshall, a ...
'', demonstrations uploaded by OpenAI showed impressive
coreference resolution In linguistics, coreference, sometimes written co-reference, occurs when two or more expressions refer to the same person or thing; they have the same referent. For example, in ''Bill said Alice would arrive soon, and she did'', the words ''Alice'' ...
capabilities. The demonstrators were able to create a
browser game A browser game or a "flash game" is a video game that is played via the internet using a web browser. They are mostly free-to-play and can be single-player or multiplayer. Some browser games are also available as mobile apps, PC games, or on c ...
in JavaScript and generate data science charts using
matplotlib Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPytho ...
. OpenAI showed that Codex can interface with services and apps such as
Mailchimp Mailchimp is a marketing automation platform and email marketing service. "Mailchimp" is the trade name of its operator, Rocket Science Group, an American company founded in 2001 by Ben Chestnut and Mark Armstrong, with Dan Kurzius joining at a l ...
,
Microsoft Word Microsoft Word is a word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other platforms includin ...
,
Spotify Spotify (; ) is a proprietary Swedish audio streaming and media services provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 456 million monthly active us ...
, and
Google Calendar Google Calendar is a time-management and scheduling calendar service developed by Google. It became available in beta release April 13, 2006, and in general release in July 2009, on the web and as mobile apps for the Android and iOS platforms. ...
.
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
is Codex's capabilities.


Issues

OpenAI demonstrations showcased flaws such as inefficient code and one-off quirks in code samples. In an interview with ''
The Verge ''The Verge'' is an American technology news website operated by Vox Media, publishing news, feature stories, guidebooks, product reviews, consumer electronics news, and podcasts. The website launched on November 1, 2011, and uses Vox Media' ...
'', OpenAI chief technology officer Greg Brockman said that "sometimes
odex Odex Pte. Ltd. is a Singapore-based company that licenses and releases anime for local and regional Southeast Asian consumption. Odex was established in 1987 and set up its Anime Distribution department in 2000, selling anime in Singapore. ...
doesn't quite know exactly what you're asking" and that it can require some trial and error. OpenAI researchers found that Codex struggles with multi-step and prompts, often failing or yielding counter-intuitive behavior. Additionally, they brought up several safety issues, such as over-reliance by novice programmers, biases based on the training data, and security impacts due to vulnerable code. ''VentureBeat'' stated that because Codex is trained on public data, it could be vulnerable to "data poisoning" via intentional uploads of malicious code. According to a study by researchers from
New York University New York University (NYU) is a private research university in New York City. Chartered in 1831 by the New York State Legislature, NYU was founded by a group of New Yorkers led by then-Secretary of the Treasury Albert Gallatin. In 1832, the ...
, approximately 40% of code generated by
GitHub Copilot GitHub Copilot is a cloud-based artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by autocompleting code. Currently ...
(which uses Codex) in scenarios relevant to high-risk CWEs included glitches or other exploitable design flaws.


Copyright

The
Free Software Foundation The Free Software Foundation (FSF) is a 501(c)#501(c)(3), 501(c)(3) non-profit organization founded by Richard Stallman on October 4, 1985, to support the free software movement, with the organization's preference for software being distributed ...
expressed concerns that code snippets generated by Copilot and Codex could violate copyright, in particular the condition of the
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
that requires
derivative work In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of an original, previously created first work (the underlying work). The derivative work becomes a second, separate work independent in fo ...
s to be licensed under equivalent terms. Issues they raised include whether training on public repositories falls into
fair use Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to balance the interests ...
or not, how developers could discover infringing generated code, whether trained
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
models could be considered modifiable source code or a compilation of the training data, and if machine learning models could themselves be copyrighted and by whom. An internal GitHub study found that approximately 0.1% of generated code contained direct copies from the training data. In one example the model outputted the training data code implementing the
fast inverse square root Fast inverse square root, sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5F3759DF, is an algorithm that estimates \frac, the Multiplicative inverse, reciprocal (or multiplicative inverse) of the square root of a 32-bit ...
algorithm, including comments and an incorrect
copyright notice In United States copyright law, a copyright notice is a notice of statutorily prescribed form that informs users of the underlying claim to copyright ownership in a published work. Copyright is a form of protection provided by US law to author ...
. In response, OpenAI stated that "legal uncertainty on the copyright implications of training AI systems imposes substantial costs on AI developers and so should be authoritatively resolved." The copyright issues with Codex have been compared to the '' Authors Guild, Inc. v. Google, Inc.'' court case, in which judges ruled that
Google Books Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical c ...
's use of text snippets from millions of scanned books constituted fair use.


References

{{OpenAI navbox Deep learning software applications Copyright infringement of software OpenAI