HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, floating-point arithmetic (FP) is
arithmetic Arithmetic () is an elementary part of mathematics that consists of the study of the properties of the traditional operations on numbers—addition, subtraction, multiplication, division, exponentiation, and extraction of roots. In the 19th c ...
that represents
real number In mathematics, a real number is a number that can be used to measurement, measure a ''continuous'' one-dimensional quantity such as a distance, time, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small var ...
s approximately, using an
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
with a fixed precision, called the
significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
, scaled by an integer
exponent Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
of a fixed base. For example, 12.345 can be represented as a base-ten floating-point number: 12.345 = \underbrace_\text \times \underbrace_\text\!\!\!\!\!\!^ In practice, most floating-point systems use base two, though base ten ( decimal floating point) is also common. The term ''floating point'' refers to the fact that the number's radix point can "float" anywhere to the left, right, or between the significant digits of the number. This position is indicated by the exponent, so floating point can be considered a form of
scientific notation Scientific notation is a way of expressing numbers that are too large or too small (usually would result in a long string of digits) to be conveniently written in decimal form. It may be referred to as scientific form or standard index form, o ...
. A floating-point system can be used to represent, with a fixed number of digits, numbers of very different
orders of magnitude An order of magnitude is an approximation of the logarithm of a value relative to some contextually understood reference value, usually 10, interpreted as the base of the logarithm and the representative of values of magnitude one. Logarithmic di ...
— such as the number of meters between galaxies or between protons in an atom. For this reason, floating-point arithmetic is often used to allow very small and very large real numbers that require fast processing times. The result of this dynamic range is that the numbers that can be represented are not uniformly spaced; the difference between two consecutive representable numbers varies with their exponent. Over the years, a variety of floating-point representations have been used in computers. In 1985, the
IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
Standard for Floating-Point Arithmetic was established, and since the 1990s, the most commonly encountered representations are those defined by the IEEE. The speed of floating-point operations, commonly measured in terms of
FLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
, is an important characteristic of a
computer system A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
, especially for applications that involve intensive mathematical calculations. A
floating-point unit In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
(FPU, colloquially a math coprocessor) is a part of a computer system specially designed to carry out operations on floating-point numbers.


Overview


Floating-point numbers

A
number representation A numeral system (or system of numeration) is a writing system for expressing numbers; that is, a mathematical notation for representing numbers of a given set, using digits or other symbols in a consistent manner. The same sequence of symbo ...
specifies some way of encoding a number, usually as a string of digits. There are several mechanisms by which strings of digits can represent numbers. In common mathematical notation, the digit string can be of any length, and the location of the radix point is indicated by placing an explicit "point" character (dot or comma) there. If the radix point is not specified, then the string implicitly represents an
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
and the unstated radix point would be off the right-hand end of the string, next to the least significant digit. In fixed-point systems, a position in the string is specified for the radix point. So a fixed-point scheme might be to use a string of 8 decimal digits with the decimal point in the middle, whereby "00012345" would represent 0001.2345. In
scientific notation Scientific notation is a way of expressing numbers that are too large or too small (usually would result in a long string of digits) to be conveniently written in decimal form. It may be referred to as scientific form or standard index form, o ...
, the given number is scaled by a power of 10, so that it lies within a certain range—typically between 1 and 10, with the radix point appearing immediately after the first digit. The scaling factor, as a power of ten, is then indicated separately at the end of the number. For example, the orbital period of
Jupiter Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass more than two and a half times that of all the other planets in the Solar System combined, but slightly less than one-thousandt ...
's moon Io is seconds, a value that would be represented in standard-form scientific notation as seconds. Floating-point representation is similar in concept to scientific notation. Logically, a floating-point number consists of: * A signed (meaning positive or negative) digit string of a given length in a given base (or radix). This digit string is referred to as the ''
significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
'', ''mantissa'', or ''coefficient''. The length of the significand determines the ''precision'' to which numbers can be represented. The radix point position is assumed always to be somewhere within the significand—often just after or just before the most significant digit, or to the right of the rightmost (least significant) digit. This article generally follows the convention that the radix point is set just after the most significant (leftmost) digit. * A signed integer
exponent Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
(also referred to as the ''characteristic'', or ''scale''), which modifies the magnitude of the number. To derive the value of the floating-point number, the ''significand'' is multiplied by the ''base'' raised to the power of the ''exponent'', equivalent to shifting the radix point from its implied position by a number of places equal to the value of the exponent—to the right if the exponent is positive or to the left if the exponent is negative. Using base-10 (the familiar decimal notation) as an example, the number , which has ten decimal digits of precision, is represented as the significand together with 5 as the exponent. To determine the actual value, a decimal point is placed after the first digit of the significand and the result is multiplied by to give , or . In storing such a number, the base (10) need not be stored, since it will be the same for the entire range of supported numbers, and can thus be inferred. Symbolically, this final value is: \frac \times b^e, where is the significand (ignoring any implied decimal point), is the precision (the number of digits in the significand), is the base (in our example, this is the number ''ten''), and is the exponent. Historically, several number bases have been used for representing floating-point numbers, with base two ( binary) being the most common, followed by base ten ( decimal floating point), and other less common varieties, such as base sixteen ( hexadecimal floating point), base eight (octal floating point), base four (quaternary floating point), base three ( balanced ternary floating point) and even base 256 and base . A floating-point number is a
rational number In mathematics, a rational number is a number that can be expressed as the quotient or fraction of two integers, a numerator and a non-zero denominator . For example, is a rational number, as is every integer (e.g. ). The set of all ra ...
, because it can be represented as one integer divided by another; for example is (145/100)×1000 or /100. The base determines the fractions that can be represented; for instance, 1/5 cannot be represented exactly as a floating-point number using a binary base, but 1/5 can be represented exactly using a decimal base (, or ). However, 1/3 cannot be represented exactly by either binary (0.010101...) or decimal (0.333...), but in base 3, it is trivial (0.1 or 1×3−1) . The occasions on which infinite expansions occur depend on the base and its prime factors. The way in which the significand (including its sign) and exponent are stored in a computer is implementation-dependent. The common IEEE formats are described in detail later and elsewhere, but as an example, in the binary single-precision (32-bit) floating-point representation, p = 24, and so the significand is a string of 24
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
s. For instance, the number π's first 33 bits are: 11001001\ 00001111\ 1101101\underline\ 10100010\ 0. In this binary expansion, let us denote the positions from 0 (leftmost bit, or most significant bit) to 32 (rightmost bit). The 24-bit significand will stop at position 23, shown as the underlined bit above. The next bit, at position 24, is called the ''round bit'' or ''rounding bit''. It is used to round the 33-bit approximation to the nearest 24-bit number (there are specific rules for halfway values, which is not the case here). This bit, which is in this example, is added to the integer formed by the leftmost 24 bits, yielding: 11001001\ 00001111\ 1101101\underline. When this is stored in memory using the IEEE 754 encoding, this becomes the
significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
. The significand is assumed to have a binary point to the right of the leftmost bit. So, the binary representation of π is calculated from left-to-right as follows: \begin &\left(\sum_^ \text_n \times 2^\right) \times 2^e \\ = &\left(1 \times 2^ + 1 \times 2^ + 0 \times 2^ + 0 \times 2^ + 1 \times2^ + \cdots + 1 \times 2^\right) \times 2^1 \\ \approx &1.5707964 \times 2 \\ \approx &3.1415928 \end where is the precision ( in this example), is the position of the bit of the significand from the left (starting at and finishing at here) and is the exponent ( in this example). It can be required that the most significant digit of the significand of a non-zero number be non-zero (except when the corresponding exponent would be smaller than the minimum one). This process is called ''normalization''. For binary formats (which uses only the digits and ), this non-zero digit is necessarily . Therefore, it does not need to be represented in memory; allowing the format to have one