HOME
        TheInfoList






In computing, endianness is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address. Endianness may also be used to describe the order in which the bits are transmitted over a communication channel. Bit-endianess is seldom used in other contexts.

Computers store information in various sized groups of binary bits. Each group is assigned a number, called its address, that the computer uses to access that data. On most modern computers, the smallest data group with an address is eight bits long and is called a byte. Larger groups comprise two or more bytes, for example, a 32-bit word contains four bytes. There are two possible ways a computer could number the individual bytes in a larger group, starting at either end. Both types of endianness are in widespread use in digital electronic engineering. The initial choice of endianness of a new design is often arbitrary, but later technology revisions and updates perpetuate the existing endianness to maintain backward compatibility.

Internally, any given computer will work equally well regardless of what endianness it uses since its hardware will consistently use the same endianness to both store and load its data. For this reason, programmers and computer users normally ignore the endianness of the computer they are working with. However, endianness can become an issue when moving data external to the computer – as when transmitting data between different computers, or a programmer investigating internal computer bytes of data from a memory dump – and the endianness used differs from expectation. In these cases, the endianness of the data must be understood and accounted for. Bi-endianess is a feature supported by numerous computer architectures that feature switchable endianness in data fetches and stores or for instruction fetches.

Big-endianness is the dominant ordering in networking protocols, such as in the internet protocol suite, where it is referred to as network order, transmitting the most significant byte first. Conversely, little-endianness is the dominant ordering for processor architectures (x86, most ARM implementations, base RISC-V implementations) and their associated memory. File formats can use either ordering; some formats use a mixture of both.

The term may also be used more generally for the internal ordering of any representation, such as the digits in a numeral system or the sections of a date. Numbers in place-value notation are written with their digits in big-endian order, even in right-to-left scripts. Similarly, programming languages use big-endian digit ordering for numeric literals as well as big-endian way of speaking for bit-shift operations (namely “left” [to be associated with low address] shifts towards the MSB), regardless of the endianness of the target architecture.

While not allowed by C++, such type punning code is allowed as "implementation-defined" by the C11 standard[15] and commonly used[16] in code interacting with hardware.[17]

On the other hand, in some situations it may be useful to obtain an a

On the other hand, in some situations it may be useful to obtain an approximation of a multi-byte or multi-word value by reading only its most significant portion instead of the complete representation; a big-endian processor may read such an approximation using the same base-address that would be used for the full value.

Optimizations of this kind are not portable across systems of different endianness.

Some operations in positional number systems have a natural or preferred order in which the elementary steps are to be executed. This order may affect their performance on small-scale byte-addressable processors and microcontrollers.

However, high-performance processors usually fetch typical multi-byte operands from memory in the same amount of time they would have fetched a single byte, so the complexity of the hardware is not affected by the byte ordering.

Operat

However, high-performance processors usually fetch typical multi-byte operands from memory in the same amount of time they would have fetched a single byte, so the complexity of the hardware is not affected by the byte ordering.

As learnt in school, addition, subtraction, and multiplication start at the least significant digit position and propagate the carry to the subsequent more significant position. Addressing multi-digit data at its first (= smallest address) byte is the predominant addressing scheme. When this first byte contains the least significant digit – which is equivalent to little-endianness, then the implementation of these operations is marginally simpler.[note 4]

Operations starting at the most significant digit

Comparison and division start at the most significant digit and propagate a possible carry to the subsequent less significant digits. For fixed-length numerical values (typically of length 1,2,4,8,16), the implementation of these operations is marginally simpler on big-endian machines.

Operands of varying length

In the programming language C the lexicographical comparison of character strings has to be done by subroutines, which is frequently offered and implemented as a subroutine (e. g. strcmp).

Many big-endian processors contain hardware instructions for lexicographically comparing varying length character strings (e. g. the IBM System/360 and its successors).

Operations without impact

The PDP-11 is in principle a 16-bit little-endian system. The instructions to conve

The PDP-11 is in principle a 16-bit little-endian system. The instructions to convert between floating-point and integer values in the optional floating-point processor of the PDP-11/45, PDP-11/70, and in some later processors, stored 32-bit "double precision integer long" values with the 16-bit halves swapped from the expected little-endian order. The UNIX C compiler used the same format for 32-bit long integers. This ordering is known as PDP-endian.[18]

A way to interpret this endianness is that it stores a 32-bit integer as two 16-bit words in big-endian, but the words themselves are little-endian (E.g. "jag cog sin" would be "gaj goc nis"):

Middle-endian machines like the Honeywell 316 above complicate this even further: the 32-bit value is stored as two 16-bit words 'hn' 'Jo' in little-endian, themselves with a big-endian notation (thus 'h' 'n' 'J' 'o').

This conflict between the memory arrangements of binary data and text is intrinsic to the nature of the little-endian convention.

Byte swapping

Byte-swapping consists of masking each byte and shifting them to the correct location. Many compilers provide built-ins that are likely to be compiled into native processor instructions (bswap/movbe), such as __builtin_bswap32. Software interfaces for swapping include:

  • Standard network endianness functions (from/to BE, up to 32-bit).[19] Windows has a 64-bit extension in winsock2.h.
  • BSD and Glibc endian.h functions (from/to BE and LE, up to 64-bit).[20]
  • macOS OSByteOrder.h macros (from/to BE and LE, up to 64-bit).

Files and filesystems

The recognition of endianness is important when reading a file or filesys

This conflict between the memory arrangements of binary data and text is intrinsic to the nature of the little-endian convention.

Byte-swapping consists of masking each byte and shifting them to the correct location. Many compilers provide built-ins that are likely to be compiled into native processor instructions (bswap/movbe), such as __builtin_bswap32. Software interfaces for swapping include:

  • Standard network endianness functions (from/to BE, up to 32-bit).[19] Windows has a 64-bit extension in winsock2.h.The recognition of endianness is important when reading a file or filesystem that was created on a computer with different endianness.

    Some CPU instruction sets provide native support for endian byte swapping, such as bswap[21] (x86 - 486 and later), and rev[22] (ARMv6 and later).

    Some compilers have built-in facilities for byte swapping. For example, the Intel Fortran compiler supports the non-standard CONVERT specifier when opening a file, e.g.: CPU instruction sets provide native support for endian byte swapping, such as bswap[21] (x86 - 486 and later), and rev[22] (ARMv6 and later).

    Some compilers have built-in facilities for byte swapping. For example, the Intel Fortran compiler supports the non-standard CONVERT specifier when opening a file, e.g.: OPEN(unit, CONVERT='BIG_ENDIAN',...).

    Some compilers have options for generating code that globally enable the conversion for all file IO operations. This permits the reuse of code on a system with the opposite endianness without code modification.

    Fortran sequential unformatted files created with one endianness usually cannot be read on a system using the other endianness because Fortran usually implements a record (defined as the data written by a single Fortran statement) as data preceded and succeeded by count fields, which are integers equal to the number of bytes in the data. An attempt to read such a file using Fortran on a system of the other endianness then results in a run-time error, because the count fields are incorrect. This problem can be avoided by writing out sequential binary files as opposed to sequential unformatted. Note however that it is relatively simple to write a program in another language (such as C or Python) that parses Fortran sequential unformatted files of "foreign" endianness and converts them to "native" endianness, by converting from the "foreign" endianness when reading the Fortran records and data.

    Unicode text can optionally start with a byte order mark (BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In UTF-32 for example, a big-endian file should start with 00 00 FE FF; a little-endian should start with FF FE 00 00.

    Application binary data formats, such as for example MATLAB .mat files, or the .bil data format, used in topography, are usually endianness-independent. This is achieved by storing the data always in one fixed endianness, or carrying with the data a switch to indicate the endianness.

    An example of the first case is the binary XLS file format that is portable between Windows and Mac systems and always little-endian, leaving the Mac application to swap the bytes on load and save when running on a big-endian Motorola 68K or PowerPC processor.[23]

    TIFF image files are an example of the second strategy, whose header instructs the application about endianness of their internal binary integers. If a file starts with the signature MM it means that integers are represented as big-endian, while II means little-endian. Those signatures need a single 16-bit word each, and they are palindromes (that is, they read the same forwards and backwards), so they are endianness independent. I stands for Intel and M stands for Motorola, the respective CPU providers of the IBM PC compatibles (Intel) and Apple Macintosh platforms (Motorola) in the 1980s. Intel CPUs are little-endian, while Motorola 680x0 CPUs are big-endian. This explicit signature allows a TIFF reader program to swap bytes if necessary when a given file was generated by a TIFF writer program running on a computer with a different endianness.

    As a consequence of its original implementation on the Intel 8080 platform, the operating system-independent File Allocation Table (FAT) file system is defined with little-endian byte ordering, even on platforms using another endiannes natively, necessitating byte-swap operations for maintaining the FAT.

    ZFS/OpenZFS combined file system and logical volume manager is known to provide adaptive endianness and to work with both big-endian and little-endian systems.[24]

    Many IETF RFCs use the term network order, meaning the order of transmission for bits and bytes over the wire in network protocols. Among others, the historic RFC 1700 (also known as Internet standard STD 2) has defined the network order for protocols in the Internet protocol suite to be big-endian, hence the use of the term "network byte order" for big-endian byte order.[25]

    However, not all protocols use big-endian byte order as the network order. The Server Message Block (SMB) protocol uses little-endian byte order. In CANopen, multi-byte parameters are always

    However, not all protocols use big-endian byte order as the network order. The Server Message Block (SMB) protocol uses little-endian byte order. In CANopen, multi-byte parameters are always sent least significant byte first (little-endian). The same is true for Ethernet Powerlink.[26]

    The Berkeley sockets API defines a set of functions to convert 16-bit and 32-bit integers to and from network byte order: the htons (host-to-network-short) and htonl (host-to-network-long) functions convert 16-bit and 32-bit values respectively from machine (host) to network order; the ntohs and ntohl functions convert from network to host order. These functions may be a no-op on a big-endian system.

    While the high-level network protocols usually consider the byte (mostly meant as octet) as their atomic unit, the lowest network protocols may deal with ordering of bits within a byte.

    Bit numbering is a concept similar to endianness, but on a level of bits, not bytes. Bit endianness or bit-level endianness refers to the transmission order of bits over a serial medium. The bit-level analogue of little-endian (least significant bit goes first) is used in RS-232, HDLC, Ethernet, and USB. Some protocols use the opposite ordering (e.g. Teletext, I2C, SMBus, PMBus, and SONET and SDH[27]). Usually, there exists a consistent view to the bits irrespective of their order in the byte, such that the latter becomes relevant only on a very low level. One exception is caused by the feature of some cyclic redundancy checks to detect all burst errors up to a known length, which would be spoiled if the bit order is different from the byte order on serial transmission.

    Apart from serialization, the terms bit endianness and bit-level endianness are seldom used, as computer architectures where each individual bit has a unique address are rare. Individual bits or bit fields are accessed via their numerical value or, in hig

    Apart from serialization, the terms bit endianness and bit-level endianness are seldom used, as computer architectures where each individual bit has a unique address are rare. Individual bits or bit fields are accessed via their numerical value or, in high-level programming languages, assigned names, the effects of which, however, may be machine dependent or lack software portability.