History
The first version of the MIPS architecture was designed by MIPS Computer Systems for its R2000 microprocessor, the first MIPS implementation. Both MIPS and the R2000 were introduced together in 1985. When MIPS II was introduced, ''MIPS'' was renamed ''MIPS I'' to distinguish it from the new version. MIPS Computer Systems'Design
MIPS is a modular architecture supporting up to four coprocessors (CP0/1/2/3). In MIPS terminology, CP0 is the System Control Coprocessor (an essential part of the processor that is implementation-defined in MIPS I–V), CP1 is an optionalVersions
MIPS I
MIPS is a load/store architecture (also known as a ''register-register architecture''); except for the load/store instructions used to accessRegisters
MIPS I has thirty-two 32-bit general-purpose registers (GPR). Register is hardwired to zero and writes to it are discarded. Register is the link register. For integer multiplication and division instructions, which run asynchronously from other instructions, a pair of 32-bit registers, ''HI'' and ''LO'', are provided. There is a small set of instructions for copying data between the general-purpose registers and the HI/LO registers. The program counter has 32 bits. The two low-order bits always contain zero since MIPS I instructions are 32 bits long and are aligned to their natural word boundaries.Instruction formats
Instructions are divided into three types: R (register), I (immediate), and J (jump). Every instruction starts with a 6-bit opcode. In addition to the opcode, R-type instructions specify three registers, a shift amount field, and a function field; I-type instructions specify two registers and a 16-bit immediate value; J-type instructions follow the opcode with a 26-bit jump target. The following are the three formats used for the core instruction set:CPU instructions
MIPS I has instructions that load and store 8-bit bytes, 16-bit halfwords, and 32-bit words. Only one addressing mode is supported: base + displacement. Since MIPS I is a 32-bit architecture, loading quantities fewer than 32 bits requires the datum to be either sign-extended or zero-extended to 32 bits. The load instructions suffixed by "unsigned" perform zero extension; otherwise sign extension is performed. Load instructions source the base from the contents of a GPR (rs) and write the result to another GPR (rt). Store instructions source the base from the contents of a GPR (rs) and the store data from another GPR (rt). All load and store instructions compute the memory address by summing the base with the sign-extended 16-bit immediate. MIPS I requires all memory accesses to be aligned to their natural word boundaries, otherwise an exception is signaled. To support efficient unaligned memory accesses, there are load/store word instructions suffixed by "left" or "right". All load instructions are followed by a load delay slot. The instruction in the load delay slot cannot use the data loaded by the load instruction. The load delay slot can be filled with an instruction that is not dependent on the load; a NOP is substituted if such an instruction cannot be found. MIPS I has instructions to perform addition and subtraction. These instructions source their operands from two GPRs (rs and rt), and write the result to a third GPR (rd). Alternatively, addition can source one of the operands from a 16-bit immediate (which is sign-extended to 32 bits). The instructions for addition and subtraction have two variants: by default, an exception is signaled if the result overflows; instructions with the "unsigned" suffix do not signal an exception. The overflow check interprets the result as a 32-bit two's complement integer. MIPS I has instructions to perform bitwise logical AND, OR, XOR, and NOR. These instructions source their operands from two GPRs and write the result to a third GPR. The AND, OR, and XOR instructions can alternatively source one of the operands from a 16-bit immediate (which is zero-extended to 32 bits). The "Set on ''relation''" instructions write one or zero to the destination register if the specified relation is true or false. These instructions source their operands from two GPRs or one GPR and a 16-bit immediate (which is sign-extended to 32 bits), and write the result to a third GPR. By default, the operands are interpreted as signed integers. The variants of these instructions that are suffixed with "unsigned" interpret the operands as unsigned integers (even those that source an operand from the sign-extended 16-bit immediate). The Load Upper Immediate instruction copies the 16-bit immediate into the high-order 16 bits of a GPR. It is used in conjunction with the Or Immediate instruction to load a 32-bit immediate into a register. MIPS I has instructions to perform left and right logical shifts and right arithmetic shifts. The operand is obtained from a GPR (rt), and the result is written to another GPR (rd). The shift distance is obtained from either a GPR (rs) or a 5-bit "shift amount" (the "sa" field). MIPS I has instructions for signed and unsigned integer multiplication and division. These instructions source their operands from two GPRs and write their results to a pair of 32-bit registers called HI and LO, since they may execute separately from (and concurrently with) the other CPU instructions. For multiplication, the high- and low-order halves of the 64-bit product is written to HI and LO (respectively). For division, the quotient is written to LO and the remainder to HI. To access the results, a pair of instructions (Move from HI and Move from LO) is provided to copy the contents of HI or LO to a GPR. These instructions are interlocked: reads of HI and LO do not proceed past an unfinished arithmetic instruction that will write to HI and LO. Another pair of instructions (Move to HI or Move to LO) copies the contents of a GPR to HI and LO. These instructions are used to restore HI and LO to their original state after exception handling. Instructions that read HI or LO must be separated by two instructions that do not write to HI or LO. All MIPS I control flow instructions are followed by aMIPS II
MIPS II removed the load delay slot and added several sets of instructions. For shared-memory multiprocessing, the ''Synchronize Shared Memory'', ''Load Linked Word'', and ''Store Conditional Word'' instructions were added. A set of Trap-on-Condition instructions were added. These instructions caused an exception if the evaluated condition is true. All existing branch instructions were given ''branch-likely'' versions that executed the instruction in the branch delay slot only if the branch is taken. These instructions improve performance in certain cases by allowing useful instructions to fill the branch delay slot. Doubleword load and store instructions for COP1–3 were added. Consistent with other memory access instructions, these loads and stores required the doubleword to be naturally aligned. The instruction set for the floating point coprocessor also had several instructions added to it. An IEEE 754-compliant floating-point square root instruction was added. It supported both single- and double-precision operands. A set of instructions that converted single- and double-precision floating-point numbers to 32-bit words were added. These complemented the existing conversion instructions by allowing the IEEE rounding mode to be specified by the instruction instead of the Floating Point Control and Status Register.MIPS III
MIPS III is a backwards-compatible extension of MIPS II that added support for 64-bit memory addressing and integer operations. The 64-bit data type is called a doubleword, and MIPS III extended the general-purpose registers, HI/LO registers, and program counter to 64 bits to support it. New instructions were added to load and store doublewords, to perform integer addition, subtraction, multiplication, division, and shift operations on them, and to move doubleword between the GPRs and HI/LO registers. For shared-memory multiprocessing, the ''Load Linked Double Word'', and ''Store Conditional Double Word'' instructions were added. Existing instructions originally defined to operate on 32-bit words were redefined, where necessary, to sign-extend the 32-bit results to permit words and doublewords to be treated identically by most instructions. Among those instructions redefined was ''Load Word''. In MIPS III it sign-extends words to 64 bits. To complement ''Load Word'', a version that zero-extends was added. The R instruction format's inability to specify the full shift distance for 64-bit shifts (its 5-bit shift amount field is too narrow to specify the shift distance for doublewords) required MIPS III to provide three 64-bit versions of each MIPS I shift instruction. The first version is a 64-bit version of the original shift instructions, used to specify constant shift distances of 0–31 bits. The second version is similar to the first, but adds 3210 the shift amount field's value so that constant shift distances of 32–63 bits can be specified. The third version obtains the shift distance from the six low-order bits of a GPR. MIPS III added a ''supervisor'' privilege level in between the existing kernel and user privilege levels. This feature only affected the implementation-defined System Control Processor (Coprocessor 0). MIPS III removed the Coprocessor 3 (CP3) support instructions, and reused its opcodes for the new doubleword instructions. The remaining coprocessors gained instructions to move doublewords between coprocessor registers and the GPRs. The floating general registers (FGRs) were extended to 64 bits and the requirement for instructions to use even-numbered register only was removed. This is incompatible with earlier versions of the architecture; a bit in the floating-point control/status register is used to operate the MIPS III floating-point unit (FPU) in a MIPS I- and II-compatible mode. The floating-point control registers were not extended for compatibility. The only new floating-point instructions added were those to copy doublewords between the CPU and FPU convert single- and double-precision floating-point numbers into doubleword integers and vice versa.MIPS IV
MIPS IV is the fourth version of the architecture. It is a superset of MIPS III and is compatible with all existing versions of MIPS. MIPS IV was designed to mainly improve floating-point (FP) performance. To improve access to operands, an indexed addressing mode (base + index, both sourced from GPRs) for FP loads and stores was added, as were prefetch instructions for performing memory prefetching and specifying cache hints (these supported both the base + offset and base + index addressing modes). MIPS IV added several features to improve instruction-level parallelism. To alleviate the bottleneck caused by a single condition bit, seven condition code bits were added to the floating-point control and status register, bringing the total to eight. FP comparison and branch instructions were redefined so they could specify which condition bit was written or read (respectively); and the delay slot in between an FP branch that read the condition bit written to by a prior FP comparison was removed. Support for partial predication was added in the form of conditional move instructions for both GPRs and FPRs; and an implementation could choose between having precise or imprecise exceptions for IEEE 754 traps. MIPS IV added several new FP arithmetic instructions for both single- and double-precision FPNs: fused-multiply add or subtract, reciprocal, and reciprocal square-root. The FP fused-multiply add or subtract instructions perform either one or two roundings (it is implementation-defined), to exceed or meet IEEE 754 accuracy requirements (respectively). The FP reciprocal and reciprocal square-root instructions do not comply with IEEE 754 accuracy requirements, and produce results that differ from the required accuracy by one or two units of last place (it is implementation defined). These instructions serve applications where instruction latency is more important than accuracy.MIPS V
MIPS V added a new data type, the Paired Single (PS), which consisted of two single-precision (32-bit) floating-point numbers stored in the existing 64-bit floating-point registers. Variants of existing floating-point instructions for arithmetic, compare and conditional move were added to operate on this data type in a SIMD fashion. New instructions were added for loading, rearranging and converting PS data. It was the first instruction set to exploit floating-point SIMD with existing resources.MIPS32/MIPS64
The first release of MIPS32, based on MIPS II, added conditional moves, prefetch instructions, and other features from the R4000 and R5000 families of 64-bit processors. The first release of MIPS64 adds a MIPS32 mode to run 32-bit code. The MUL and MADD ( multiply-add) instructions, previously available in some implementations, were added to the MIPS32 and MIPS64 specifications, as were cache control instructions. For the purpose of cache control, bothSYNC
and SYNCI
instructions were prepared.
MIPS32/MIPS64 Release 6 in 2014 added the following:
* a new family of branches with no delay slot:
** unconditional branches (BC) and branch-and-link (BALC) with a 26-bit offset,
** conditional branch on zero/non-zero with a 21-bit offset,
** full set of signed and unsigned conditional branches compare between two registers (e.g. BGTUC) or a register against zero (e.g. BGTZC),
** full set of branch-and-link which compare a register against zero (e.g. BGTZALC).
* index jump instructions with no delay slot designed to support large absolute addresses.
* instructions to load 16-bit immediates at bit position 16, 32 or 48, allowing to easily generate large constants.
* PC-relative load instructions, as well as address generation with large (PC-relative) offsets.
* bit-reversal and byte-alignment instructions (previously only available with the DSP extension).
* multiply and divide instructions redefined so that they use a single register for their result).
* instructions generating truth values now generate all zeroes or all ones instead of just clearing/setting the 0-bit,
* instructions using a truth value now only interpret all-zeroes as false instead of just looking at the 0-bit.
Removed infrequently used instructions:
* some conditional moves
* ''branch likely'' instructions (deprecated in previous releases).
* integer overflow trapping instructions with 16-bit immediate
* integer accumulator instructions (together HI/LO registers, moved to the DSP Application-Specific Extension)
* unaligned load instructions (LWL and LWR), (requiring that most ordinary loads and stores support misaligned access, possibly via trapping and with the addition of a new instruction (BALIGN))
Reorganized the instruction encoding, freeing space for future expansions.
microMIPS
The microMIPS32/64 architectures are supersets of the MIPS32 and MIPS64 architectures (respectively) designed to replace the MIPS16e application-specific extension (ASE). A disadvantage of MIPS16e is that it requires a mode switch before any of its 16-bit instructions can be processed. microMIPS adds versions of the most-frequently used 32-bit instructions that are encoded as 16-bit instructions. This allows programs to intermix 16- and 32-bit instructions without having to switch modes. microMIPS was introduced alongside of MIPS32/64 Release 3, and each subsequent release of MIPS32/64 has a corresponding microMIPS32/64 version. A processor may implement microMIPS32/64 or both microMIPS32/64 and its corresponding MIPS32/64 subset. Starting with MIPS32/64 Release 6, support for MIPS16e ended, and microMIPS is the only form of code compression in MIPS.Application-specific extensions
The base MIPS32 and MIPS64 architectures can be supplemented with a number of optional architectural extensions, which are collectively referred to as ''application-specific extensions'' (ASEs). These ASEs provide features that improve the efficiency and performance of certain workloads, such asMIPS MCU
Enhancements for microcontroller applications. The MCU ASE (application-specific extension) has been developed to extend theMIPS16
MIPS16 is an Application-Specific Extension for MIPS I through to V designed by LSI Logic andMIPS Digital Signal Processing (DSP)
The DSP ASE is an optional extension to the MIPS32/MIPS64 Release 2 and newer instruction sets which can be used to accelerate a large range of "media" computations—particularly audio and video. The DSP module comprises a set of instructions and state in the integer pipeline and requires minimal additional logic to implement in MIPS processor cores. Revision 2 of the ASE was introduced in the second half of 2006. This revision adds extra instructions to the original ASE, but is otherwise backwards-compatible with it. Unlike the bulk of the MIPS architecture, it's a fairly irregular set of operations, many chosen for a particular relevance to some key algorithm. Its main novel features (vs original MIPS32): * Saturating arithmetic (when a calculation overflows, deliver the representable number closest to the non-overflowed answer). * Fixed-point arithmetic on signed 32- and 16-bit fixed-point fractions with a range of -1 to +1 (these are widely called "Q31" and "Q15"). * The existing integer multiplication and multiply-accumulate instructions, which deliver results into a double-size accumulator (called "hi/lo" and 64 bits on MIPS32 CPUs). The DSP ASE adds three more accumulators, and some different flavours of multiply-accumulate. * SIMD instructions operating on 4 x unsigned bytes or 2 x 16-bit values packed into a 32-bit register (the 64-bit variant of the DSP ASE supports larger vectors, too). * SIMD operations are basic arithmetic, shifts and some multiply-accumulate type operations.MIPS SIMD architecture (MSA)
Instruction set extensions designed to accelerate multimedia. * 32 vector registers of 16 x 8-bit, 8 x 16-bit, 4 x 32-bit, and 2 x 64 bit vector elements * Efficient vector parallel arithmetic operations on integer, fixed-point and floating-point data * Operations on absolute value operands * Rounding and saturation options available * Full precision multiply and multiply-add * Conversions between integer, floating-point, and fixed-point data * Complete set of vector-level compare and branch instructions with no condition flag * Vector (1D) and array (2D) shuffle operations * Typed load and store instructions for endian-independent operation * IEEE Standard for Floating-Point Arithmetic 754-2008 compliant * Element precise floating-point exception signaling * Pre-defined scalable extensions for chips with more gates/transistors * Accelerates compute-intensive applications in conjunction with leveraging generic compiler support * Software-programmable solution for consumer electronics applications or functions not covered by dedicated hardware * Emerging data mining, feature extraction, image and video processing, and human-computer interaction applications * High-performance scientific computingMIPS virtualization
Hardware supported virtualization technology.MIPS multi-threading
Each multi-threaded MIPS core can support up to two VPEs (Virtual Processing Elements) which share a single pipeline as well as other hardware resources. However, since each VPE includes a complete copy of the processor state as seen by the software system, each VPE appears as a complete standalone processor to an SMP Linux operating system. For more fine-grained thread processing applications, each VPE is capable of supporting up to nine TCs allocated across two VPEs. The TCs share a common execution unit but each has its own program counter and core register files so that each can handle a thread from the software. The MIPS MT architecture also allows the allocation of processor cycles to threads, and sets the relative thread priorities with an optional Quality of Service ( QoS) manager block. This enables two prioritization mechanisms that determine the flow of information across the bus. The first mechanism allows the user to prioritize one thread over another. The second mechanism is used to allocate a specified ratio of the cycles to specific threads over time. The combined use of both mechanisms allows effective allocation of bandwidth to the set of threads, and better control of latencies. In real-time systems, system-level determinism is very critical, and the QoS block facilitates improvement of the predictability of a system. Hardware designers of advanced systems may replace the standard QoS block provided by MIPS Technologies with one that is specifically tuned for their application.SmartMIPS
SmartMIPS is an Application-Specific Extension (ASE) designed by Gemplus International and MIPS Technologies to improve performance and reduce memory consumption forMIPS Digital Media eXtension (MDMX)
Multimedia application accelerations that were common in the 1990s on RISC and CISC systems.Calling conventions
MIPS has had several calling conventions, especially on the 32-bit platform. The O32 ABI is the most commonly-used ABI, owing to its status as the original System V ABI for MIPS. It is strictly stack-based, with only four registers - available to pass arguments. Space on the stack is reserved in case the callee needs to save its arguments, but the registers are not stored there by the caller. The return value is stored in register ; a second return value may be stored in . The ABI took shape in 1990 and was last updated in 1994. This perceived slowness, along with an antique floating-point model with only 16 registers, has encouraged the proliferation of many other calling conventions. It is only defined for 32-bit MIPS, but GCC has created a 64-bit variation called O64. For 64-bit, the N64 ABI by Silicon Graphics is most commonly used. The most important improvement is that eight registers are now available for argument passing; it also increases the number of floating-point registers to 32. There is also an ILP32 version called N32, which uses 32-bit pointers for smaller code, analogous to the x32 ABI. Both run under the 64-bit mode of the CPU. The N32 and N64 ABIs pass the first eight arguments to a function in the registers -; subsequent arguments are passed on the stack. The return value (or a pointer to it) is stored in the registers ; a second return value may be stored in . In both the N32 and N64 ABIs all registers are considered to be 64-bits wide. A few attempts have been made to replace O32 with a 32-bit ABI that resembles N32 more. A 1995 conference came up with MIPS EABI, for which the 32-bit version was quite similar. EABI inspired MIPS Technologies to propose a more radical "NUBI" ABI additionally reuse argument registers for the return value. MIPS EABI is supported by GCC but not LLVM, and neither supports NUBI. For all of O32 and N32/N64, the return address is stored in a register. This is automatically set with the use of the JAL (jump and link) or JALR (jump and link register) instructions. The function prologue of a (non-leaf) MIPS subroutine pushes the return address (in ) to the stack. On both O32 and N32/N64 the stack grows downwards, but the N32/N64 ABIs require 64-bit alignment for all stack entries. The frame pointer () is optional and in practice rarely used except when the stack allocation in a function is determined at runtime, for example, by callingalloca()
.
For N32 and N64, the return address is typically stored 8 bytes before the stack pointer although this may be optional.
For the N32 and N64 ABIs, a function must preserve the - registers, the global pointer ( or ), the stack pointer ( or ) and the frame pointer (). The O32 ABI is the same except the calling function is required to save the register instead of the called function.
For multi-threaded code, the thread local storage pointer is typically stored in special hardware register and is accessed by using the mfhw (move from hardware) instruction. At least one vendor is known to store this information in the register which is normally reserved for kernel use, but this is not standard.
The and registers (–) are reserved for kernel use and should not be used by applications since these registers can be changed at any time by the kernel due to interrupts, context switches or other events.
Registers that are preserved across a call are registers that (by convention) will not be changed by a system call or procedure (function) call. For example, $s-registers must be saved to the stack by a procedure that needs to use them, and and are always incremented by constants, and decremented back after the procedure is done with them (and the memory they point to). By contrast, is changed automatically by any normal function call (ones that use jal), and $t-registers must be saved by the program before any procedure call (if the program needs the values inside them after the call).
The userspace calling convention of position-independent code on Linux additionally requires that when a function is called the register must contain the address of that function. This convention dates back to the System V ABI supplement for MIPS.
Uses
MIPS processors are used inSimulators
Open Virtual Platforms (OVP) includes the freely available for non-commercial use simulator OVPsim, a library of models of processors, peripherals and platforms, and APIs which enable users to develop their own models. The models in the library are open source, written in C, and include the MIPS 4K, 24K, 34K, 74K, 1004K, 1074K, M14K, microAptiv, interAptiv, proAptiv 32-bit cores and the MIPS 64-bit 5K range of cores. These models are created and maintained by Imperas and in partnership with MIPS Technologies have been tested and assigned the MIPS-Verified mark. Sample MIPS-based platforms include both bare metal environments and platforms for booting unmodified Linux binary images. These platforms–emulators are available as source or binaries and are fast, free for non-commercial usage, and are easy to use. OVPsim is developed and maintained by Imperas and is very fast (hundreds of million of instructions per second), and built to handle multicore homogeneous and heterogeneous architectures and systems. There is a freely available MIPS32 simulator (earlier versions simulated only the R2000/R3000) called SPIM for use in education. EduMIPS64 is a GPL graphical cross-platform MIPS64 CPU simulator, written in Java/Swing. It supports a wide subset of the MIPS64 ISA and allows the user to graphically see what happens in the pipeline when an assembly program is run by the CPU. MARS is another GUI-based MIPS emulator designed for use in education, specifically for use with Hennessy's ''Computer Organization and Design''. WebMIPS is a browser-based MIPS simulator with visual representation of a generic, pipelined processor. This simulator is quite useful for register tracking during step by step execution. QtMips provides a simple 5-stage pipeline visualization as well as cache principle visualization for basic computer architectures courses. It is available both as aSee also
* DLX * List of MIPS architecture processors * MIPS architecture processors *References
Further reading
* * * *External links