HOME

TheInfoList




Run-length encoding (RLE) is a form of
lossless data compression Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data. By contrast, lossy compression permits reconstruction only of an approximation of the original d ...
in which ''runs'' of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most efficient on data that contains many such runs, for example, simple graphic images such as icons, line drawings,
Conway's Game of Life The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician A mathematician is someone who uses an extensive knowledge of mathematics Mathematics (from Ancient Greek, Greek: ) includes the ...
, and animations. For files that do not have many runs, RLE could increase the file size. RLE may also be used to refer to an early graphics file format supported by
CompuServe CompuServe (CompuServe Information Service, also known by its initialism CIS) was an American online service provider, the first major commercial one in the United States – described in 1994 as "the oldest of the Big Three information services ...
for compressing black and white images, but was widely supplanted by their later
Graphics Interchange Format The Graphics Interchange Format (GIF; or , #Pronunciation, see pronunciation) is a Raster graphics, bitmap Image file formats, image format that was developed by a team at the online services provider CompuServe led by American computer scient ...
(GIF). RLE also refers to a little-used image format in Windows 3.x, with the extension rle, which is a run-length encoded bitmap, used to compress the Windows 3.x startup screen.


Example

Consider a screen containing plain black text on a solid white background, over hypothetical scan line, it can be rendered as follows: : 12W1B12W3B24W1B14W This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc., and represents the original 67 characters in only 18. While the actual format used for the storage of images is generally binary rather than
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the stu ...
characters like this, the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as DEFLATE often use
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations including ...
-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as BWWBWWBWWBWW). Run-length encoding can be expressed in multiple ways to accommodate data properties as well as additional compression algorithms. For instance, one popular method encodes run lengths for runs of two or more characters only, using an "escape" symbol to identify runs, or using the character itself as the escape, so that any time a character appears twice it denotes a run. On the previous example, this would give the following: : WW12BWW12BB3WW24BWW14 This would be interpreted as a run of twelve Ws, a B, a run of twelve Ws, a run of three Bs, etc. In data where runs are less frequent, this can significantly improve the compression rate. One other matter is the application of additional compression algorithms. Even with the runs extracted, the frequencies of different characters may be large, allowing for further compression; however, if the run lengths are written in the file in the locations where the runs occurred, the presence of these numbers interrupts the normal flow and makes it harder to compress. To overcome this, some run-length encoders separate the data and escape symbols from the run lengths, so that the two can be handled independently. For the example data, this would result in two outputs, the string "WWBWWBBWWBWW" and the numbers (12,12,3,24,14).


History and applications

Run-length encoding (RLE) schemes were employed in the transmission of analog television signals as far back as 1967. In 1983, run-length encoding was
patent A patent is a type of intellectual property Intellectual property (IP) is a category of property Property is a system of rights that gives people legal control of valuable things, and also refers to the valuable things themselves. Depe ...

patent
ed by
Hitachi () is a Japanese multinational corporation, multinational Conglomerate (company), conglomerate corporation headquartered in Chiyoda, Tokyo, Japan. It is the parent company of the Hitachi Group (''Hitachi Gurūpu'') and had formed part of the Ni ...

Hitachi
. RLE is particularly well suited to
palette Palette may refer to: * Cosmetic palette, an archaeological form * Palette, another name for a color scheme * Palette (painting), a wooden board used for mixing colors for a painting * Palette (company), a Japanese visual novel studio (video game c ...
-based bitmap images such as
computer icons In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softwa ...
, and was a popular image compression method on early
online service An online service provider (OSP) can, for example, be an Internet service provider, an email provider, a news provider (press), an entertainment provider (music, movies), a search engine, an e-commerce site, an online banking site, a health site, an ...
s such as
CompuServe CompuServe (CompuServe Information Service, also known by its initialism CIS) was an American online service provider, the first major commercial one in the United States – described in 1994 as "the oldest of the Big Three information services ...
before the advent of more sophisticated formats such as
GIF The Graphics Interchange Format (GIF; or , see pronunciation) is a bitmap In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algor ...

GIF
. It does not work well on continuous-tone images such as photographs, although
JPEG JPEG ( ) is a commonly used method of lossy compression In information technology, lossy compression or irreversible compression is the class of data encoding methods that uses inexact approximations and partial data discarding to represe ...

JPEG
uses it on the coefficients that remain after transforming and quantizing image blocks. Common formats for run-length encoded data include
Truevision TGA Truevision TGA, often referred to as TARGA, is a raster Raster may refer to: * Raster graphics, graphical techniques using arrays of pixel values * Raster graphics editor, a computer program * Raster scan, the pattern of image readout, transmi ...
,
PackBits PackBits is a fast, simple lossless compression scheme for run-length encoding of data. Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and ar ...
,
PCX PCX, standing for ''PiCture eXchange'', is an image file format Image file formats are standardized means of organizing and storing digital images. An image file format may store data in an uncompressed format, a compressed format (which may b ...
and
ILBM Interleaved Bitmap (ILBM) is an image file format Image file formats are standardized means of organizing and storing digital images. An image file format may store data in an uncompressed format, a compressed format (which may be lossless or ...
. The
International Telecommunication Union The International Telecommunication Union is a list of specialized agencies of the United Nations, specialized agency of the United Nations responsible for all matters related to information and communications technology, information and co ...

International Telecommunication Union
also describes a standard to encode run-length-colour for
fax , which was modern for fax machines at that time. which required special, relatively expensive thermal paper Thermal paper (sometimes referred to as an audit roll) is a special fine paper that is coated with a material formulated to change col ...

fax
machines, known as T.45. The standard, which is combined with other techniques into Modified Huffman coding, is relatively efficient because most faxed documents are generally white space, with occasional interruptions of black.


See also

*
Kolakoski sequence In mathematics Mathematics (from Ancient Greek, Greek: ) includes the study of such topics as quantity (number theory), mathematical structure, structure (algebra), space (geometry), and calculus, change (mathematical analysis, analysis). It ...
* Look-and-say sequence * Comparison of graphics file formats *
Golomb coding Golomb coding is a lossless data compression Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data. By contrast, lossy compression permits reconstructi ...
*
Burrows–Wheeler transform The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a ...
*
Recursive indexing{{multiple issues, {{Context, date=March 2014 {{no footnotes, date=June 2020 When number (generally large number) is represented in a finite alphabet set, and it cannot be represented by just one member of the set, recursive indexing is used. Rec ...
*
Run-length limited Run-length limited or RLL coding is a line coding In telecommunication, a line code is a pattern of voltage, current, or photons used to represent digital data transmission (telecommunications), transmitted down a transmission line. This reperto ...
*
Bitmap index A bitmap index is a special kind of database index A database index is a data structure Image:Hash table 3 1 1 0 1 0 0 SP.svg, 315px, A data structure known as a hash table. In computer science, a data structure is a data organization, managemen ...
* Forsyth–Edwards Notation, which uses run-length-encoding for empty spaces in chess positions. * DEFLATE


References


External links


Run-length encoding implemented in different programming languages
(on
Rosetta Code Rosetta Code is a wiki-based programming website with implementations of common algorithms and solutions to various programming problems in many different programming languages. It is named for the Rosetta Stone The Rosetta Stone is a gran ...
) {{Compression formats Lossless compression algorithms