A coprocessor is a computer processor used to supplement the functions of the primary processor (the
CPU). Operations performed by the coprocessor may be
floating-point arithmetic
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
,
graphics
Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture ...
,
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, and scientific measurements. Signal processing techniq ...
,
string processing,
cryptography
Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
or
I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the
main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.
Functionality
Coprocessors vary in their degree of autonomy. Some (such as
FPUs) rely on direct control via
coprocessor instructions, embedded in the
CPU's
instruction stream. Others are independent processors in their own right, capable of working asynchronously; they are still not optimized for
general-purpose code, or they are incapable of it due to a limited
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
focused on
accelerating specific tasks. It is common for these to be driven by
direct memory access
Direct memory access (DMA) is a feature of computer systems and allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).
Without DMA, when the CPU is using programmed input/output, it is t ...
(DMA), with the host processor (a CPU) building a
command list. The
PlayStation 2
The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October 2000, in Europe on 24 November 2000, and in Australia on 3 ...
's
Emotion Engine
The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North Americ ...
contained an unusual
DSP
DSP may refer to:
Computing
* Digital signal processing, the mathematical manipulation of an information signal
* Digital signal processor, a microprocessor designed for digital signal processing
* Yamaha DSP-1, a proprietary digital signal ...
-like
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
vector unit capable of both modes of operation.
History
To make the best use of
mainframe computer
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
processor time, input/output tasks were delegated to separate systems called
Channel I/O
In computing, channel I/O is a high-performance input/output (I/O) architecture that is implemented in various forms on a number of computer architectures, especially on mainframe computers. In the past, channels were generally implemented with cus ...
. The mainframe would not require any I/O processing at all, instead would just set parameters for an input or output operation and then signal the channel processor to carry out the whole of the operation. By dedicating relatively simple sub-processors to handle time-consuming I/O formatting and processing, overall system performance was improved.
Coprocessors for floating-point arithmetic first appeared in
desktop computer
A desktop computer (often abbreviated desktop) is a personal computer designed for regular use at a single location on or near a desk due to its size and power requirements. The most common configuration has a case that houses the power supply ...
s in the 1970s and became common throughout the 1980s and into the early 1990s. Early 8-bit and 16-bit processors used software to carry out
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
arithmetic operations. Where a coprocessor was supported, floating-point calculations could be carried out many times faster. Math coprocessors were popular purchases for users of
computer-aided design
Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve c ...
(CAD) software and scientific and engineering calculations. Some floating-point units, such as the
AMD 9511
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactu ...
,
Intel 8231/8232 and
Weitek
Weitek Corporation was an American chip-design company that originally focused on floating-point units for a number of commercial CPU designs. During the early to mid-1980s, Weitek designs could be found powering a number of high-end designs ...
FPUs were treated as peripheral devices, while others such as the
Intel 8087,
Motorola 68881 and
National 32081 were more closely integrated with the CPU.
Another form of coprocessor was a video display coprocessor, as used in the
Atari 8-bit family
The Atari 8-bit family is a series of 8-bit home computers introduced by Atari, Inc. in 1979 as the Atari 400 and Atari 800. The series was successively upgraded to Atari 1200XL , Atari 600XL, Atari 800XL, Atari 65XE, Atari 130XE, Atari 800XE, ...
,
TI-99/4A
The TI-99/4 and TI-99/4A are home computers released by Texas Instruments in 1979 and 1981, respectively. Based on the Texas Instruments TMS9900 microprocessor originally used in minicomputers, the TI-99/4 was the first 16-bit home computer. ...
, and
MSX home computers, which were called "
Video Display Controller
A video display controller or VDC (also called a display engine or display interface) is an integrated circuit which is the main component in a video-signal generator, a device responsible for the production of a TV video signal in a computing ...
s". The
Amiga
Amiga is a family of personal computers introduced by Commodore in 1985. The original model is one of a number of mid-1980s computers with 16- or 32-bit processors, 256 KB or more of RAM, mouse-based GUIs, and significantly improved graphi ...
custom chipset includes such a unit known as the
Copper
Copper is a chemical element with the symbol Cu (from la, cuprum) and atomic number 29. It is a soft, malleable, and ductile metal with very high thermal and electrical conductivity. A freshly exposed surface of pure copper has a pinkis ...
, as well as a
blitter
A blitter is a circuit, sometimes as a coprocessor or a logic block on a microprocessor, dedicated to the rapid movement and modification of data within a computer's memory. A blitter can copy large quantities of data from one memory area to anot ...
for accelerating
bitmap
In computing, a bitmap is a mapping from some domain (for example, a range of integers) to bits. It is also called a bit array
A bit array (also known as bitmask, bit map, bit set, bit string, or bit vector) is an array data structure that c ...
manipulation in memory.
As microprocessors developed, the cost of integrating the floating point arithmetic functions into the processor declined. High processor speeds also made a closely integrated coprocessor difficult to implement. Separately packaged mathematics coprocessors are now uncommon in desktop computers. The demand for a
dedicated graphics coprocessor has grown, however, particularly due to the increasing demand for realistic
3D graphics in
computer games.
Intel
The original
IBM PC
The IBM Personal Computer (model 5150, commonly known as the IBM PC) is the first microcomputer released in the IBM PC model line and the basis for the IBM PC compatible de facto standard. Released on August 12, 1981, it was created by a team ...
included a socket for the
Intel 8087 floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
coprocessor (aka
FPU) which was a popular option for people using the PC for
computer-aided design
Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve c ...
or mathematics-intensive calculations. In that architecture, the coprocessor speeds up floating-point arithmetic on the order of fiftyfold. Users that only used the PC for word processing, for example, saved the high cost of the coprocessor, which would not have accelerated performance of text manipulation operations.
The 8087 was tightly integrated with the
8086
The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allowi ...
/8088 and responded to floating-point
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
operation codes inserted in the 8088 instruction stream. An 8088 processor without an 8087 could not interpret these instructions, requiring separate versions of programs for FPU and non-FPU systems, or at least a test at run time to detect the FPU and select appropriate mathematical library functions.
Another coprocessor for the 8086/8088 central processor was the
8089 input/output coprocessor. It used the same programming technique as 8087 for input/output operations, such as transfer of data from memory to a peripheral device, and so reducing the load on the CPU. But IBM didn't use it in IBM PC design and Intel stopped development of this type of coprocessor.
The
Intel 80386
The Intel 386, originally released as 80386 and later renamed i386, is a 32-bit microprocessor introduced in 1985. The first versions had 275,000 transistors[microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...]
used an optional "math" coprocessor (the
80387) to perform floating point operations directly in
hardware. The Intel 80486DX processor included floating-point hardware on the chip. Intel released a cost-reduced processor, the 80486SX, that had no floating point hardware, and also sold an 80487SX coprocessor that essentially disabled the main processor when installed, since the 80487SX was a complete 80486DX with a different set of pin connections.
[Scott Mueller, ''Upgrading and repairing PCs '' 15th edition, Que Publishing, 2003 , pages 108–110]
Intel processors later than the 80486 integrated floating-point hardware on the main processor chip; the advances in integration eliminated the cost advantage of selling the floating point processor as an optional element. It would be very difficult to adapt circuit-board techniques adequate at 75 MHz processor speed to meet the time-delay, power consumption, and radio-frequency interference standards required at gigahertz-range clock speeds. These on-chip floating point processors are still referred to as coprocessors because they operate in parallel with the main CPU.
During the era of 8- and 16-bit desktop computers another common source of floating-point coprocessors was
Weitek
Weitek Corporation was an American chip-design company that originally focused on floating-point units for a number of commercial CPU designs. During the early to mid-1980s, Weitek designs could be found powering a number of high-end designs ...
. These coprocessors had a different instruction set from the Intel coprocessors, and used a different socket, which not all motherboards supported. The Weitek processors did not provide transcendental mathematics functions (for example, trigonometric functions) like the Intel x87 family, and required specific software libraries to support their functions.
[Scott Mueller, ''Upgrading and Repairing PCs, Second Edition'', Que Publishing, 1992 , pp. 412-413]
Motorola
The
Motorola 68000
The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
family had the
68881/68882 coprocessors which provided similar floating-point speed acceleration as for the Intel processors. Computers using the 68000 family but not equipped with the hardware floating point processor could trap and emulate the floating-point instructions in software, which, although slower, allowed one binary version of the program to be distributed for both cases. The 68451 memory-management coprocessor was designed to work with the 68020 processor.
[William Ford, William R. Topp
''Assembly language and systems programming for the M68000 family'' Jones & Bartlett Learning, 1992 page 892 and ff.]
Modern coprocessors
, dedicated Graphics Processing Units (
GPU
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s) in the form of
graphics card
A graphics card (also called a video card, display card, graphics adapter, VGA card/VGA, video adapter, display adapter, or mistakenly GPU) is an expansion card which generates a feed of output images to a display device, such as a computer moni ...
s are commonplace. Certain models of
sound card
A sound card (also known as an audio card) is an internal expansion card that provides input and output of audio signals to and from a computer under the control of computer programs. The term ''sound card'' is also applied to external audio i ...
s have been fitted with dedicated processors providing digital multichannel mixing and real-time DSP effects as early as 1990 to 1994 (the
Gravis Ultrasound and
Sound Blaster AWE32
The Sound Blaster AWE32 is an ISA sound card from Creative Technology. It is an expansion board for PCs and is part of the Sound Blaster family of products. The Sound Blaster AWE32, introduced in March 1994, was a near full-length ISA sound c ...
being typical examples), while the
Sound Blaster Audigy
Sound Blaster Audigy is a product line of sound cards from Creative Technology. The flagship model of the Audigy family used the EMU10K2 audio DSP, an improved version of the SB-Live's EMU10K1, while the value/SE editions were built with a less ...
and the
Sound Blaster X-Fi
Sound Blaster X-Fi is a lineup of sound cards in Creative Technology's Sound Blaster series.
History
The series was launched in August 2005 as a lineup of PCI sound cards, which served as the introduction for their X-Fi audio processing chip ...
are more recent examples.
In 2006,
AGEIA
Ageia, founded in 2002, was a fabless semiconductor company. In 2004, Ageia acquired NovodeX, the company who created PhysX – a Physics Processing Unit chip capable of performing game physics calculations much faster than general purpose CPUs ...
announced an add-in card for computers that it called the
PhysX
PhysX is an open-source realtime physics engine middleware SDK developed by Nvidia as a part of Nvidia GameWorks software suite.
Initially, video games supporting PhysX were meant to be accelerated by PhysX PPU (expansion cards designed by ...
PPU. PhysX was designed to perform complex physics computations so that the
CPU and GPU do not have to perform these time-consuming calculations. It was designed for video games, although other mathematical uses could theoretically be developed for it. In 2008, Nvidia purchased the company and phased out the PhysX card line; the functionality was added through software allowing their GPUs to render PhysX on cores normally used for graphics processing, using their Nvidia PhysX engine software.
In 2006, BigFoot Systems unveiled a PCI add-in card they christened the KillerNIC which ran its own special Linux kernel on a FreeScale
PowerQUICC
PowerQUICC is the name for several PowerPC- and Power ISA-based microcontrollers from Freescale Semiconductor. They are built around one or more PowerPC cores and the Communications Processor Module ( QUICC Engine) which is a separate RISC core sp ...
running at 400 MHz, calling the FreeScale chip a
Network Processing Unit
A network processor is an integrated circuit which has a feature set specifically targeted at the networking application domain.
Network processors are typically software programmable devices and would have generic characteristics similar to gen ...
or NPU.
The
SpursEngine is a media-oriented add-in card with a coprocessor based on the
Cell
Cell most often refers to:
* Cell (biology), the functional basic unit of life
Cell may also refer to:
Locations
* Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery ...
microarchitecture. The
SPUs are themselves vector coprocessors.
In 2008,
Khronos Group
The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computation ...
released the
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
with the aim to support general-purpose CPUs, ATI/AMD and Nvidia GPUs (and other accelerators) with a single common language for
compute kernel
In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main progr ...
s.
In 2010s, some mobile computation devices had implemented the
sensor hub
A sensor hub is a microcontroller unit/ coprocessor/DSP set that helps to integrate data from different sensors and process them. This technology can help off-load these jobs from a product's main central processing unit, thus saving battery consum ...
as a coprocessor. Examples of coprocessors used for handling sensor integration in mobile devices include the
Apple M7
The Apple M-series coprocessors are motion coprocessors used by Apple Inc. in their mobile devices. First released in 2013, their function is to collect sensor data from integrated accelerometers, gyroscopes and compasses and offload the collect ...
and M8
motion coprocessors, the
Qualcomm Snapdragon Sensor Core and
Qualcomm Hexagon
Hexagon is the brand name for a family of digital signal processor (DSP) products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.” According to Qualcomm, the Hexagon architecture is desig ...
, and the
Holographic Processing Unit
Holography is a technique that enables a wavefront to be recorded and later re-constructed. Holography is best known as a method of generating real three-dimensional images, but it also has a wide range of other applications. In principle, it ...
for the
Microsoft HoloLens
Microsoft HoloLens is an augmented reality (AR)/ mixed reality (MR) headset developed and manufactured by Microsoft. HoloLens runs the Windows Mixed Reality platform under the Windows 10 operating system. Some of the positional tracking techn ...
.
In 2012,
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
announced the
Intel Xeon Phi coprocessor.
, various companies are developing coprocessors aimed at accelerating
artificial neural networks for vision and other cognitive tasks (e.g.
vision processing unit
A vision processing unit (VPU) is (as of 2018) an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks.
Overview
Vision processing units are distinct from video processing units ...
s,
TrueNorth A cognitive computer is a computer that hardwires artificial intelligence and machine-learning algorithms into an integrated circuit (printed circuit board) that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic ...
, and
Zeroth
0th or zeroth may refer to:
Mathematics, science and technology
* 0th or zeroth, an ordinal for the number zero
* 0th dimension, a topological space
* 0th element, of a data structure in computer science
* Zeroth (software), deep learning softw ...
), and as of 2018, such AI chips are in smartphones such as from Apple, and several Android phone vendors.
Other coprocessors
* The
MIPS architecture
MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies, ...
supports up to four coprocessor units, used for memory management, floating-point arithmetic, and two undefined coprocessors for other tasks such as graphics accelerators.
* Using
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
(field-programmable gate arrays), custom coprocessors can be created for acceleration of particular processing tasks such as digital signal processing (e.g.
Zynq
Xilinx, Inc. ( ) was an American technology and semiconductor company that primarily supplied programmable logic devices. The company was known for inventing the first commercially viable field-programmable gate array (FPGA) and creating the fi ...
, combines
ARM
In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
cores with FPGA on a single die).
*
TLS/SSL accelerators, used on
server
Server may refer to:
Computing
*Server (computing), a computer program or a device that provides functionality for other programs or devices, called clients
Role
* Waiting staff, those who work at a restaurant or a bar attending customers and su ...
s; such accelerators used to be cards, but in modern times are instructions for crypto in mainstream CPUs.
* Some
multi-core chips can be programmed so that one of their processors is the primary processor, and the other processors are supporting coprocessors.
* China's
Matrix 2000
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** ''The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
128 core PCI-e coprocessor is a proprietary accelerator that requires a CPU to run it, and has been employed in an upgrade of the 17,792 node
Tianhe-2
Tianhe-2 or TH-2 (, i.e. 'Milky Way 2') is a 33.86- petaflops supercomputer located in the National Supercomputer Center in Guangzhou, China. It was developed by a team of 1,300 scientists and engineers.
It was the world's fastest supercomputer ...
supercomputer (2 Intel Knights Bridge+ 2 Matrix 2000 each), now dubbed 2A, roughly doubling its speed at 95 petaflops, exceeding the
world's fastest supercomputer.
* A range of coprocessors were available for Acorn
BBC Micro
The British Broadcasting Corporation Microcomputer System, or BBC Micro, is a series of microcomputers and associated peripherals designed and built by Acorn Computers in the 1980s for the BBC Computer Literacy Project. Designed with an emphas ...
computers. Rather than special-purpose graphics or arithmetic devices, these were general-purpose CPUs (such as 8086, Zilog Z80, or 6502) to which particular types of task were assigned by the operating system, off-loading them from the computer's main CPU and resulting in acceleration. In addition, a BBC Micro fitted with a coprocessor was able to run machine code software designed for other systems, such as CP/M and DOS which are written for 8086 processors.
Trends
Over time CPUs have tended to grow to absorb the functionality of the most popular coprocessors. FPUs are now considered an integral part of a processors' main pipeline;
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
units gave multimedia its acceleration, taking over the role of various
DSP
DSP may refer to:
Computing
* Digital signal processing, the mathematical manipulation of an information signal
* Digital signal processor, a microprocessor designed for digital signal processing
* Yamaha DSP-1, a proprietary digital signal ...
accelerator cards; and even
GPUs
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
have become integrated on CPU dies. Nonetheless, specialized units remain popular away from desktop machines, and for additional power, and allow continued evolution independently of the main processor product lines.
See also
*
Multiprocessing
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There ar ...
, the use of two or more CPUs within a single computer system
*
Torrenza Torrenza was an initiative announced by Advanced Micro Devices (AMD) in 2006 to improve support for the integration of specialized coprocessors in systems based on AMD Opteron microprocessors. Torrenza does not refer to a specific product or specif ...
, an initiative to implement coprocessor support for AMD processors
*
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
framework for writing programs that execute across heterogeneous platforms
*
Asymmetric multiprocessing An asymmetric multiprocessing (AMP or ASMP) system is a multiprocessor computer system where not all of the multiple interconnected central processing units (CPUs) are treated equally. For example, a system might allow (either at the hardware or ope ...
*
AI accelerator
An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications i ...
References
{{Authority control
Central processing unit
Heterogeneous computing
OpenCL compute devices