HOME

TheInfoList



OR:

Run-length encoding (RLE) is a form of
lossless data compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
in which ''runs'' of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most efficient on data that contains many such runs, for example, simple graphic images such as icons, line drawings,
Conway's Game of Life The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970. It is a zero-player game, meaning that its evolution is determined by its initial state, requiring no furthe ...
, and animations. For files that do not have many runs, RLE could increase the file size. RLE may also be used to refer to an early graphics file format supported by
CompuServe CompuServe (CompuServe Information Service, also known by its initialism CIS) was an American online service provider, the first major commercial one in the world – described in 1994 as "the oldest of the Big Three information services (the oth ...
for compressing black and white images, but was widely supplanted by their later
Graphics Interchange Format The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. ...
(GIF). RLE also refers to a little-used image format in
Windows 3.x Windows 3.x means either of, or all of the following versions of Microsoft Windows: * Windows 3.0 * Windows 3.1x Windows 3.1 is a major release of Microsoft Windows. It was released to manufacturing on April 6, 1992, as a successor to Windo ...
, with the extension rle, which is a run-length encoded bitmap, used to compress the Windows 3.x startup screen.


Example

Consider a screen containing plain black text on a solid white background. There will be many long runs of white
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the smal ...
s in the blank space, and many short runs of black pixels within the text. A hypothetical
scan line A scan line (also scanline) is one line, or row, in a raster scanning pattern, such as a line of video on a cathode ray tube (CRT) display of a television set or computer monitor. On CRT screens the horizontal scan lines are visually discernible ...
, with B representing a black pixel and W representing white, might read as follows: : WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows: : 12W1B12W3B24W1B14W This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc., and represents the original 67 characters in only 18. While the actual format used for the storage of images is generally binary rather than
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
characters like this, the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as DEFLATE often use
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations includin ...
-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as BWWBWWBWWBWW). Run-length encoding can be expressed in multiple ways to accommodate data properties as well as additional compression algorithms. For instance, one popular method encodes run lengths for runs of two or more characters only, using an "escape" symbol to identify runs, or using the character itself as the escape, so that any time a character appears twice it denotes a run. On the previous example, this would give the following: : WW12BWW12BB3WW24BWW14 This would be interpreted as a run of twelve Ws, a B, a run of twelve Ws, a run of three Bs, etc. In data where runs are less frequent, this can significantly improve the compression rate. One other matter is the application of additional compression algorithms. Even with the runs extracted, the frequencies of different characters may be large, allowing for further compression; however, if the run lengths are written in the file in the locations where the runs occurred, the presence of these numbers interrupts the normal flow and makes it harder to compress. To overcome this, some run-length encoders separate the data and escape symbols from the run lengths, so that the two can be handled independently. For the example data, this would result in two outputs, the string "WWBWWBBWWBWW" and the numbers (12,12,3,24,14).


History and applications

Run-length encoding (RLE) schemes were employed in the transmission of analog television signals as far back as 1967. In 1983, run-length encoding was
patent A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A p ...
ed by
Hitachi () is a Japanese multinational corporation, multinational Conglomerate (company), conglomerate corporation headquartered in Chiyoda, Tokyo, Japan. It is the parent company of the Hitachi Group (''Hitachi Gurūpu'') and had formed part of the Ni ...
. RLE is particularly well suited to palette-based bitmap images such as
computer icons In computing, an icon is a pictogram or ideogram displayed on a computer screen in order to help the user navigate a computer system. The icon itself is a quickly comprehensible symbol of a software tool, function, or a data file, accessible on th ...
, and was a popular image compression method on early
online service An online service provider (OSP) can, for example, be an Internet service provider, an email provider, a news provider (press), an entertainment provider (music, movies), a search engine, an e-commerce site, an online banking site, a health site, ...
s such as
CompuServe CompuServe (CompuServe Information Service, also known by its initialism CIS) was an American online service provider, the first major commercial one in the world – described in 1994 as "the oldest of the Big Three information services (the oth ...
before the advent of more sophisticated formats such as GIF. It does not work well on continuous-tone images such as photographs, although
JPEG JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and imag ...
uses it on the coefficients that remain after transforming and quantizing image blocks. Common formats for run-length encoded data include
Truevision TGA Truevision TGA, often referred to as TARGA, is a raster graphics file format created by Truevision Inc. (now part of Avid Technology). It was the native format of TARGA and VISTA boards, which were the first graphic cards for IBM-compatib ...
,
PackBits PackBits is a fast, simple lossless compression scheme for run-length encoding of data. Apple introduced the PackBits format with the release of MacPaint on the Macintosh computer. This compression scheme can be used in TIFF files. TGA files als ...
(by Apple, used in
MacPaint MacPaint is a raster graphics editor developed by Apple Computer and released with the original Macintosh personal computer on January 24, 1984. It was sold separately for US$195 with its word processing counterpart, MacWrite. MacPaint was nota ...
),
PCX PCX, standing for ''PiCture eXchange'', was an image file format developed by the now-defunct ZSoft Corporation of Marietta, Georgia, United States. It was the native file format for PC Paintbrush and became one of the first widely accepted DOS ...
and
ILBM Interleaved Bitmap (ILBM) is an image file format conforming to the Interchange File Format (IFF) standard. The format originated on the Amiga platform, and on IBM-compatible systems, files in this format or the related PBM (Planar Bitmap) form ...
. The
International Telecommunication Union The International Telecommunication Union is a specialized agency of the United Nations responsible for many matters related to information and communication technologies. It was established on 17 May 1865 as the International Telegraph Unio ...
also describes a standard to encode run-length-colour for
fax Fax (short for facsimile), sometimes called telecopying or telefax (the latter short for telefacsimile), is the telephonic transmission of scanned printed material (both text and images), normally to a telephone number connected to a printer o ...
machines, known as T.45. The standard, which is combined with other techniques into
Modified Huffman coding Modified Huffman coding is used in fax machines to encode black-on-white images (bitmaps). It combines the variable-length codes of Huffman coding with the coding of repetitive data in run-length encoding. The basic Huffman coding provides a way t ...
, is relatively efficient because most faxed documents are generally white space, with occasional interruptions of black.


See also

*
Kolakoski sequence In mathematics, the Kolakoski sequence, sometimes also known as the Oldenburger–Kolakoski sequence, is an infinite sequence of symbols that is the sequence of run lengths in its own run-length encoding. It is named after the recreational mathe ...
*
Look-and-say sequence In mathematics, the look-and-say sequence is the integer sequence, sequence of integers beginning as follows: : 1, 11, 21, 1211, 111221, 312211, 13112221, 1113213211, 31131211131221, ... . To generate a member of the sequence from the previous m ...
*
Comparison of graphics file formats This is a comparison of image file formats (graphics file formats). This comparison primarily features file formats for 2D images. General Ownership of the format and related information. Technical details See also * List of codecs Referen ...
*
Golomb coding Golomb coding is a lossless data compression method using a family of data compression codes invented by Solomon W. Golomb in the 1960s. Alphabets following a geometric distribution will have a Golomb code as an optimal prefix code, making Golomb ...
*
Burrows–Wheeler transform The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated c ...
*
Recursive indexing {{no footnotes, date=June 2020 Recursive indexing is an algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a ...
*
Run-length limited Run-length limited or RLL coding is a line coding technique that is used to send arbitrary data over a communications channel with bandwidth limits. RLL codes are defined by four main parameters: ''m'', ''n'', ''d'', ''k''. The first two, ''m'' ...
* Bitmap index *
Forsyth–Edwards Notation Forsyth–Edwards Notation (FEN) is a standard notation for describing a particular board position of a chess game. The purpose of FEN is to provide all the necessary information to restart a game from a particular position. FEN is based on a sys ...
, which uses run-length-encoding for empty spaces in chess positions. * DEFLATE


References


External links


Run-length encoding implemented in different programming languages
(on
Rosetta Code Rosetta Code is a wiki-based programming website with implementations of common algorithms and solutions to various programming problems in many different programming languages. It is named for the Rosetta Stone, which has the same text inscribe ...
)
Single Header Run-Length Encoding Library
smallest possible implementation (about 20 SLoC) in ANSI C. FOSS, compatible with
Truevision TGA Truevision TGA, often referred to as TARGA, is a raster graphics file format created by Truevision Inc. (now part of Avid Technology). It was the native format of TARGA and VISTA boards, which were the first graphic cards for IBM-compatib ...
, supports 8, 16, 24 and 32 bit elements too. {{Compression formats Lossless compression algorithms