Zero ASIC Corporation, formerly Adapteva, Inc., is a

fabless Fabless manufacturing is the design and sale of hardware devices and semiconductor chips while outsourcing their fabrication (or ''fab'') to a specialized manufacturer called a semiconductor foundry. These foundries are typically, but not exclus ...

semiconductor A semiconductor is a material which has an electrical resistivity and conductivity, electrical conductivity value falling between that of a electrical conductor, conductor, such as copper, and an insulator (electricity), insulator, such as glas ...

company A company, abbreviated as co., is a Legal personality, legal entity representing an association of people, whether Natural person, natural, Legal person, legal or a mixture of both, with a specific objective. Company members share a common p ...

focusing on low power

many core Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are us ...

microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...

design. The company was the second company to announce a design with 1,000 specialized processing cores on a single

integrated circuit An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece (or "chip") of semiconductor material, usually silicon. Large numbers of tiny ...

. Adapteva was founded in 2008 with the goal of bringing a ten times advancement in

floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...

performance per watt In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consume ...

for the mobile device market. Products are based on its Epiphany multi-core

multiple instruction, multiple data In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be exe ...

(MIMD) architecture and its Parallella

Kickstarter Kickstarter is an American public benefit corporation based in Brooklyn, New York, that maintains a global crowdfunding platform focused on creativity. The company's stated mission is to "help bring creative projects to life". As of July 2021, ...

project promoting "a supercomputer for everyone" in September 2012. The company name is a combination of "adapt" and the Hebrew word "Teva" meaning nature.

History

Adapteva was founded in March 2008, by Andreas Olofsson. The company was founded with the goal of bringing a 10× advancement in

processing

energy efficiency Energy efficiency may refer to: * Energy efficiency (physics), the ratio between the useful output and input of an energy conversion process ** Electrical efficiency, useful power output per electrical power consumed ** Mechanical efficiency, a ra ...

for the

mobile device A mobile device (or handheld computer) is a computer small enough to hold and operate in the hand. Mobile devices typically have a flat LCD or OLED screen, a touchscreen interface, and digital or physical buttons. They may also have a physical ...

market. In May 2009, Olofsson had a prototype of a new type of

massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...

multi-core

computer architecture In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, t ...

. The initial prototype was implemented in 65 nm and had 16 independent microprocessor cores. The initial prototypes enabled Adapteva to secure US$1.5 million in series-A funding from BittWare, a company from

Concord, New Hampshire Concord () is the capital city of the U.S. state of New Hampshire and the seat of Merrimack County. As of the 2020 census the population was 43,976, making it the third largest city in New Hampshire behind Manchester and Nashua. The village of ...

, in October 2009. Adapteva's first commercial chip product started sampling to customers in early May 2011 and they soon thereafter announced the capability to put up to 4,096 cores on a single chip. The ''Epiphany III'', was announced in October 2011 using 28 nm and 65 nm manufacturing processes.

Products

Adapteva's main product family is the Epiphany scalable multi-core

MIMD In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be exe ...

architecture. The Epiphany architecture could accommodate chips with up to 4,096

RISC In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comput ...

out-of-order

s, all sharing a single

32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in 32-bit units. Compared to smaller bit widths, 32-bit computers can perform large calculation ...

flat memory space. Each

processor in the Epiphany architecture is

superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...

with 64× 32-bit unified register file (integer or

single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floating- ...

) microprocessor operating up to 1

GHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that one he ...

and capable of 2

GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...

(single-precision). Epiphany's RISC processors use a custom

instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...

(ISA) optimised for single-precision floating-point, but are programmable in high level

ANSI C ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the ...

using a standard GNU-GCC tool chain. Each RISC processor (in current implementations; not fixed in the architecture) has 32 KB of local memory. Code (possibly duplicated in each core) and stack space should be in that

local memory This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices. A ...

; in addition (most) temporary data should fit there for full speed. Data can also be used from other processor cores local memory at a speed penalty, or off-chip RAM with much larger speed penalty. The memory architecture does not employ explicit hierarchy of hardware caches, similar to the Sony/Toshiba/IBM

Cell processor Cell is a Multi-core processor, multi-core microprocessor microarchitecture that combines a general-purpose PowerPC Central processing unit, core of modest performance with streamlined coprocessor, coprocessing elements which greatly accelerate m ...

, but with the additional benefit of off-chip and inter-core loads and stores being supported (which simplifies porting software to the architecture). It is a hardware implementation of

partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion ...

. This eliminated the need for complex

cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...

hardware, which places a practical limit on the number of cores in a traditional multicore system. The design allows the programmer to leverage greater foreknowledge of independent data access patterns to avoid the runtime cost of figuring this out. All processor nodes are connected through a

network on chip A network on a chip or network-on-chip (NoC or )This article uses the convention that "NoC" is pronounced . Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as an ...

, allowing efficient message passing.

Scalability

The architecture is designed to scale almost indefinitely, with 4 ''e-links'' allowing multiple chips to be combined in a grid topology, allowing for systems with thousands of cores.

Multi-core coprocessors

Adapteva Parallella DK02 - Epiphany (15455181926)

On August 19, 2012, Adapteva posted some specifications and information about Epiphany multi-core coprocessors. In September 2012, a 16-core version, the Epiphany-III (E16G301), was produced using 65 nm (11.5 mm, 500 MHz chipLinley Gwennap
Adapteva: More Flops, Less Watts. Epiphany Offers Floating-Point Accelerator for Mobile Processors.
//

Microprocessor Report ''Microprocessor Report'' is a newsletter covering the microprocessor industry. The publication is accessible only to paying subscribers. To avoid bias, it does not take advertisements. The publication provides extensive analysis of new high-perfo ...

, June 2011) and engineering samples of 64-core Epiphany-IV (E64G401) were produced using 28 nm

GlobalFoundries GlobalFoundries Inc. (GF or GloFo) is a multinational semiconductor contract manufacturing and design company incorporated in the Cayman Islands and headquartered in Malta, New York. Created by the divestiture of the manufacturing arm of AMD, th ...

process (800 MHz). The primary markets for the Epiphany multi-core architecture include: *

Smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...

applications such as

real-time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...

facial recognition,

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...

translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...

, and

augmented reality Augmented reality (AR) is an interactive experience that combines the real world and computer-generated content. The content can span multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. AR can be de ...

. * Next generation

supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...

s requiring drastically better energy efficiency to allow systems to scale to

exaflop In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...

computing levels. *

Floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...

acceleration in

embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...

s based on

field-programmable gate array A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware d ...

architectures.

Parallella project

Adapteva Parallella DK02 - Top (15478282925)

In September 2012, Adapteva started project Parallella on

, which was marketed as "''A Supercomputer for everyone''." Architecture reference manuals for the platform were published as part of the campaign to attract attention to the project. The US$750,000 funding goal was reached in a month, with a minimum contribution of US$99 entitling backers to obtain one device; although the initial deadline was set for May 2013, the first single-board computers with 16-core Epiphany chip were finally shipped in December 2013. Size of board is planned to be .Rick Merritt
Adapteva Kickstarts Hundred-Dollar Supercomputer
// EETimes, September 27, 2012 The Kickstarter campaign raised US$898,921. Raising US$3 million goal was unsuccessful, so no 64-core version of Parallella will be mass-produced.Andrew Back
Introducing the $99 Linux Supercomputer
, Linux.com, January 24, 2013: "pledges of $99 or more being rewarded with at least one board with a 16-core device. ... The 16-core Epiphany chip delivers 26 GFLOPS of performance and with the entire Parallella computer consuming only 5 watts" Kickstarter users having donated more than US$750 will get "parallella-64" variant with 64-core coprocessor (made from initial prototype manufacturing with 50 chips yield per wafer).

Epiphany V

By 2016, the firm had

taped out ''Taped'' is a 2012 Dutch thriller film directed by Diederik van Rooijen. The film won the Best Feature Film award at the 2012 Stony Brook Film Festival. Susan Visser was also nominated for the Golden Calf for Best Actress for her role in the ...

a 1024-core

64-bit In computer architecture, 64-bit Integer (computer science), integers, memory addresses, or other Data (computing), data units are those that are 64 bits wide. Also, 64-bit central processing unit, CPUs and arithmetic logic unit, ALUs are those ...

variant of their Epiphany architecture that featured: larger local stores (64 KB), 64-bit addressing, double-precision floating-point arithmetic or

SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...

single-precision, and 64-bit integer instructions, implemented in the 16 nm process node. This design included instruction set enhancements aimed at

deep-learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. D ...

and

cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...

applications. In July 2017, Adapteva's founder became a

DARPA The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adv ...

br>MTO
program manager and announced that the Epiphany V was "unlikely" to become available as a commercial product.

Performance

The latest Parallella boards with E16 Epiphany chips can be compared to many historic supercomputers in terms of raw performance (just as an example, the Cray 1the first supercomputer per sehad a peak performance of 80 MFLOPS at 1976, and its successor the Cray 2 had a peak performance of 1.9 GFLOPS at 1985), and can certainly be used for parallel code development. The architectural similarities to supercomputers (message passing and

NUMA Nuclear mitotic apparatus protein 1 is a protein that in humans is encoded by the ''NUMA1'' gene. Interactions Nuclear mitotic apparatus protein 1 has been shown to interact with PIM1, Band 4.1, GPSM2 and EPB41L1 Band 4.1-like protein 1 is a pro ...

) make the Parallella a potentially useful development system, compared to traditional SMP machines. The point being that for a power envelope of 5 W and in terms of GFLOPS/mm² of chip die space, the current E16 Epiphany chips provide vastly more performance than anything else available to date, with an architecture designed to scale, and applicable to more than just

embarrassingly parallel In parallel computing, an embarrassingly parallel workload or problem (also called embarrassingly parallelizable, perfectly parallel, delightfully parallel or pleasingly parallel) is one where little or no effort is needed to separate the problem i ...

GPU tasks. (e.g. it would be capable of running the

actor model The actor model in computer science is a mathematical model of concurrent computation that treats ''actor'' as the universal primitive of concurrent computation. In response to a message it receives, an actor can: make local decisions, create more ...

with many concurrent, fully independent states). It is also suitable for DSP-like tasks where data could be fed directly on chip (from an FPGA or other ASIC) without having to create buffers in temporary memory as for a GPU), making it ideal for robotics & other intelligent sensor applications. The architecture also allows parallella boards to be combined into a cluster with a fast inter-chip 'eMesh' interconnect, extending the logical grid of cores (creating almost unlimited scaling potential). The 16-core Parallella has roughly 5.0 GFLOPs/W, and the 64-core Epiphany-IV made with 28 nm estimated as 50 GFLOPs/W (single-precision), and 32-board system based on them has 15 GFLOPS/W. For comparison, top GPUs from AMD and Nvidia reached 10 GFLOPs/W for single-precision in 2009–2011 timeframe.

References

External links

*
Parallella specifications
{{Single-board computer Computer companies of the United States Companies based in Lexington, Massachusetts American companies established in 2008 Electronics companies of the United States Reconfigurable computing Manycore processors