MAJC (Microprocessor Architecture for Java Computing) was a

Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the ...

multi-core, multithreaded,

very long instruction word Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...

(VLIW)

microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...

design from the mid-to-late 1990s. Originally called the UltraJava processor, the MAJC processor was targeted at running

Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...

programs, whose "late compiling" allowed Sun to make several favourable design decisions. The processor was released into two commercial graphical cards from Sun. Lessons learned regarding multi-threads on a multi-core processor provided a basis for later

OpenSPARC OpenSPARC is an open-source hardware project started in December 2005. The initial contribution to the project was Sun Microsystems' register-transfer level (RTL) Verilog code for a full 64-bit, 32- thread microprocessor, the UltraSPARC T1 processor ...

implementations such as the

UltraSPARC T1 Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore CPU. Designed to lower the energy consumption of server computers, the CPU ty ...

Design elements

Move instruction scheduling to the compiler

Like other VLIW designs, notably

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...

IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in coll ...

(Itanium), MAJC attempted to improve performance by moving several expensive operations out of the processor and into the related compilers. In general, VLIW designs attempt to eliminate the ''instruction scheduler'', which often represents a relatively large amount of the overall processor's transistor budget. With this portion of the CPU removed to software, those transistors can be used for other purposes, often to add additional

functional unit In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...

s to process more instructions at once, or to increase the amount of

cache memory In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...

to reduce the amount of time spent waiting for data to arrive from the much slower

main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...

. Although MAJC shared these general concepts, it was unlike other VLIW designs, and processors in general, in a number of specific details.

Generalized functional units

Most processors include a number of separate "subprocessors" known as ''functional units'' that are tuned to operating on a particular type of data. For instance, a modern CPU typically has two or three functional units dedicated to processing

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...

data and logic instructions, known as ALUs, while other units handle

floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...

numbers, the

FPU FPU may stand for: Universities * Florida Polytechnic University, in Lakeland, Florida, United States * Franklin Pierce University, in New Hampshire, United States * Fresno Pacific University, in California, United States * Fukui Prefectural Univ ...

s, or multimedia data,

SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...

. MAJC instead used a single multi-purpose functional unit which could process any sort of data. In theory this approach meant that processing any one type of data would take longer, perhaps much longer, than processing the same data in a unit dedicated to that type of data. But on the other hand, these general-purpose units also meant that you did not end up with large portions of the CPU being unused because the program just happened to be doing many (for example) floating point calculations at that particular point in time.

Variable-length instruction packets

Another difference is that MAJC allowed for variable-length " instruction packets", which under VLIW contain a number of instructions that the compiler has determined can be run at the same time. Most VLIW architectures use fixed-length packets and when they cannot find an instruction to run they instead fill it with a NOP, which simply takes up space. Although variable-length instruction packets added some complexity to the CPU, it reduced code size and thus the number of expensive ''cache misses'' by increasing the amount of code in the cache at any one time.

Avoiding interlocks and stalls

The primary difference was the way that the MAJC design required the compiler to avoid ''interlocks'', pauses in execution while the results of one instruction need to be processed for the next to be able to run. For instance, if the processor is fed the instructions C = A + B, E = C + D, then the second instruction can be run only after the first completes. Most processors include locks in the design to stall out and reschedule these sorts of interlocked instructions, allowing some other instructions to run while the value of C is being calculated. However these interlocks are very expensive in terms of chip real-estate, and represents the majority of the instruction scheduler's logic. In order for the compiler to avoid these interlocks, it would have to know exactly how long each of these instructions would take to complete. For instance, if a particular implementation took three cycles to complete a floating-point multiplication, MAJC compilers would attempt to schedule in other instructions that took three cycles to complete and were not currently stalled. A change in the actual implementation might reduce this delay to only two instructions, however, and the compiler would need to be aware of this change. This means that the compiler was not tied to MAJC as a whole, but a ''particular implementation'' of MAJC, each individual CPU based on the MAJC design. This would normally be a serious logistical problem; consider the number of different variations of the Intel

IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...

design for instance, each one would need its own dedicated compiler and the developer would have to produce a different binary for every one. However it is precisely this concept that drives the Java market–there is indeed a different compiler for each ISA, and it is installed on the client's machine instead of the developer's. The developer ships only a single

bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...

version of their program, and the user's machine compiles that to the underlying platform. In reality, scheduling instructions in this fashion turns out to be a very difficult problem. In real-world use, processors that attempt to do this scheduling at runtime encounter numerous events when the data needed is outside the cache, and there is no other instruction in the program that isn't also dependent on such data. In these cases the processor might stall for long periods, waiting on main memory. The VLIW approach does not help much in this regard; although the compiler might be able to spend more time looking for instructions to run, that doesn't mean that it can actually find one. MAJC attempted to address this problem through the ability to execute code from other threads if the current thread stalled on memory. Switching threads is normally a very expensive process known as a

context switch In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes ...

, and on a normal processor the switch would overwhelm any savings and generally slow the machine down. On MAJC, the system could hold the state for up to four threads in memory at the same time, reducing the context switch to a few instructions in length. This feature has since appeared on other processors; Intel refers to it as

HyperThreading Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multip ...

. MAJC took this idea one step further, and tried to prefetch data and instructions needed for threads while they were stalled. Most processors include similar functionality for parts of an instruction stream, known as

speculative execution Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing t ...

, where the processor runs both of the possible outcomes of a branch while waiting for the deciding variable to calculate. MAJC instead continued to run the thread as if it were not stalled, using this execution to find and then load any data or instructions that would soon be needed when the thread stopped stalling. Sun referred to this as ''Space-Time Computing'' (STC), and it is a

speculative multithreading Thread Level Speculation (TLS), also known as Speculative Multithreading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal exec ...

design. Processors up to this point had tried to extract parallelism in a single thread, a technique that was reaching its limits in terms of diminishing returns. In seems that in a general sense the MAJC design attempted to avoid stalls by running ''across'' threads (and programs) as opposed to looking for parallelism in a single thread. VLIW is generally expected to be somewhat worse in terms of stalls because it is difficult to understand runtime behavior at compile-time, making the MAJC approach in dealing with this problem particularly interesting.

Implementations

Sun built a single model of the MAJC, the two-core ''MAJC 5200'', which was the heart of Sun's XVR-1000 and XVR-4000

workstation A workstation is a special computer designed for technical or scientific applications. Intended primarily to be used by a single user, they are commonly connected to a local area network and run multi-user operating systems. The term ''workstat ...

graphics boards. However many of the multicore and multithreading design ideas, notably in terms of using multiple threads to reduce stalling delays, worked their way into the Sun

SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...

processor line, as well as designs from other companies. Additionally, the MAJC idea of designing the processor to run as many threads as possible, as opposed to instructions, appears to be the basis of the later

(code-named ''Niagara'') design.

External links

Sun's MAJC and Intels's IA-64
{{Sun hardware Graphics chips Sun microprocessors Very long instruction word computing