Run-length limited or RLL coding is a
line coding
In telecommunication, a line code is a pattern of voltage, current, or photons used to represent digital data transmitted down a communication channel or written to a storage medium. This repertoire of signals is usually called a constrained co ...
technique that is used to send arbitrary data over a
communications channel
A communication channel refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel in telecommunications and computer networking. A channel is used for informat ...
with
bandwidth
Bandwidth commonly refers to:
* Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range
* Bandwidth (computing), the rate of data transfer, bit rate or thr ...
limits. RLL codes are defined by four main parameters: ''m'', ''n'', ''d'', ''k''. The first two, ''m''/''n'', refer to the rate of the code, while the remaining two specify the minimal ''d'' and maximal ''k'' number of zeroes between consecutive ones. This is used in both
telecommunication
Telecommunication is the transmission of information by various types of technologies over wire, radio, optical, or other electromagnetic systems. It has its origin in the desire of humans for communication over a distance greater than that fe ...
and storage systems that move a medium past a fixed
recording head
A recording head is the physical interface between a recording apparatus and a moving recording medium. Recording heads are generally classified according to the physical principle that allows them to impress their data upon their medium. A record ...
.
Specifically, RLL bounds the length of stretches (runs) of repeated bits during which the signal does not change. If the runs are too long,
clock recovery
In serial communication of digital data, clock recovery is the process of extracting timing information from a serial data stream itself, allowing the timing of the data in the stream to be accurately determined without separate clock information. ...
is difficult; if they are too short, the high frequencies might be attenuated by the communications channel. By
modulating
In music, modulation is the change from one tonality ( tonic, or tonal center) to another. This may or may not be accompanied by a change in key signature (a key change). Modulations articulate or create the structure or form of many pieces, as ...
the
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
, RLL reduces the timing uncertainty in
decoding the stored data, which would lead to the possible erroneous insertion or removal of bits when reading the data back. This mechanism ensures that the boundaries between bits can always be accurately found (preventing
bit slip
In digital transmission, bit slip is the loss or gain of a bit or bits, caused by clock drift – variations in the respective clock rates of the transmitting and receiving devices.
One cause of bit slippage is overflow of a receive buffer that ...
), while efficiently using the media to reliably store the maximal amount of data in a given space.
Early disk drives used very simple encoding schemes, such as RLL (0,1) FM code, followed by RLL (1,3) MFM code, which were widely used in
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s until the mid-1980s and are still used in digital optical discs such as
CD,
DVD
The DVD (common abbreviation for Digital Video Disc or Digital Versatile Disc) is a digital optical disc data storage format. It was invented and developed in 1995 and first released on November 1, 1996, in Japan. The medium can store any kin ...
,
MD,
Hi-MD
Hi-MD is a magneto-optical disc-based data storage format. It was a further development of the MiniDisc. With its release in later 2004, and
Blu-ray
The Blu-ray Disc (BD), often known simply as Blu-ray, is a digital optical disc data storage format. It was invented and developed in 2005 and released on June 20, 2006 worldwide. It is designed to supersede the DVD format, and capable of sto ...
. Higher-density RLL (2,7) and RLL (1,7) codes became the de facto industry standard for hard disks by the early 1990s.
Need for RLL coding
On a
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
, information is represented by changes in the direction of the magnetic field on the disk, and on magnetic media, the playback output is proportional to the density of flux transition. In a computer, information is represented by the voltage on a wire. No voltage on the wire in relation to a defined ground level would be a binary zero, and a positive voltage on the wire in relation to ground represents a binary one. Magnetic media, on the other hand, always carries a magnetic flux either a "north" pole or a "south" pole. In order to convert the magnetic fields to binary data, some encoding method must be used to translate between the two.
One of the simplest practical codes, modified non-return-to-zero-inverted (
NRZI), simply encodes a 1 as a magnetic polarity transition, also known as a "flux reversal", and a zero as no transition. With the disk spinning at a constant rate, each bit is given an equal time period, a "data window", for the magnetic signal that represents that bit, and the flux reversal, if any, occurs at the start of this window. (Note: older hard disks used one fixed length of time as the data window over the whole disk, but modern disks are more complicated; for more on this, see
zoned bit recording In computer storage, zone bit recording (ZBR) is a method used by disk drives to optimise the tracks for increased data capacity. It does this by placing more sectors per zone on outer tracks than on inner tracks. This contrasts with other approac ...
.)
This method is not quite that simple, as the playback output is proportional to the density of ones, a long run of zeros means no playback output at all.
In a simple example, consider the binary pattern 101 with a data window of 1 ns (one nanosecond, or one billionth of a second). This will be stored on the disk as a change, followed by no change, and then another change. If the preceding magnetic polarity was already positive, the resulting pattern might look like this: −−+. A value of 255, or all binary ones, would be written as −+−+−+−+ or +−+−+−+−. A zero byte would be written as ++++++++ or −−−−−−−−. A 512-byte sector of zeros would be written as 4096 sequential bits with the same polarity.
Since a disk drive is a physical piece of hardware, the rotational speed of the drive can change slightly, due to a change in the motor speed or thermal expansion of the disk platter. The physical media on a floppy disk can also become deformed, causing larger timing errors, and the timing circuit on the controller itself may have small variations in speed. The problem is that, with a long string of zeros, there's no way for the disk drive's controller to know the exact position of the read head, and thus no way to know exactly how many zeros there are. A speed variation of even 0.1%, which is more precise than any practical floppy drive, could result in 4 bits being added to or removed from the 4096-bit data stream. Without some form of synchronization and error correction, the data would become completely unusable.
The other problem is due to the limits of magnetic media itself: it is only possible to write so many polarity changes in a certain amount of space, so there's an upper limit to how many ones can also be written sequentially, this depends on the linear velocity and the head gap.
To prevent this problem, data is coded in such a way that long repetitions of a single binary value do not occur. By limiting the number of zeros written consecutively, this makes it possible for the drive controller to stay synchronized. By limiting the number of ones written in a row, the overall frequency of polarity changes is reduced, allowing the drive to store more data in the same amount of space, resulting in either a smaller package for the same amount of data or more storage in the same size package.
History
All codes used to record on magnetic disks have limited the length of transition-free runs and can therefore be characterized as RLL codes. The earliest and simplest variants were given specific names, such as
modified frequency modulation (MFM), and the name "RLL" is commonly used only for the more complex variants not given such specific names, but the term technically applies to them all.
The first "RLL" code used in hard drives was RLL (2,7), developed by
IBM engineers and first used commercially in 1979 on the IBM 3370
DASD
A direct-access storage device (DASD) (pronounced ) is a secondary storage device in which "each physical record has a discrete location and a unique address". The term was coined by IBM to describe devices that allowed random access to data, t ...
, for use with the 4300 series
mainframe
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterprise ...
. During the late 1980s,
PC hard disks began using RLL proper (i.e. variants more complex than those that had received their own proper names, such as MFM). RLL codes have found almost universal application in optical-disc recording practice since 1980. In consumer electronics, RLLs like the
EFM code (rate = 8/17, ''d'' = 2, ''k'' = 10) are employed in the
Compact Disc
The compact disc (CD) is a Digital media, digital optical disc data storage format that was co-developed by Philips and Sony to store and play digital audio recordings. In August 1982, the first compact disc was manufactured. It was then rele ...
(CD) and
MiniDisc
MiniDisc (MD) is an erasable magneto-optical disc-based data storage format offering a capacity of 60, 74, and later, 80 minutes of digitized audio.
Sony announced the MiniDisc in September 1992 and released it in November of that year fo ...
(MD), and the
EFMPlus code (rate = 8/16, ''d'' = 2, ''k'' = 10) used in the
DVD
The DVD (common abbreviation for Digital Video Disc or Digital Versatile Disc) is a digital optical disc data storage format. It was invented and developed in 1995 and first released on November 1, 1996, in Japan. The medium can store any kin ...
. Parameters ''d'' and ''k'' are the minimal and maximal allowed run lengths. For more coverage on the storage technologies, the references cited in this article are useful.
Technical overview
Generally ''run length'' is the number of bits for which signal remains unchanged. A run length of 3 for bit 1, represents a sequence 111. For instance, the pattern of magnetic polarizations on the disk might be +−−−−++−−−++++++, with runs of length 1, 4, 2, 3, and 6. However, run-length limited coding terminology assumes NRZI encoding, so 1 bits indicate changes and 0 bits indicate the absence of change, the above sequence would be expressed as 11000101001000001, and only runs of zero bits are counted.
Somewhat confusingly, the run length is the number of zeros (0, 3, 1, 2 and 5 in the preceding) between adjacent ones, which is one less than the number of bit times the signal actually remains unchanged. Run-length limited sequences are characterized by two parameters, ''d'' and ''k'', which stipulate the minimal and maximal zero-bit run length that can occur in the sequence. So RLL codes are generally specified as (''d'',''k'') RLL, e.g.: (1,3) RLL.
Coding
In the encoded format a "1" bit indicates a flux transition, while a "0" indicates that the magnetic field on the disk does not change for that time interval.
FM: (0,1) RLL
Generally, the term "RLL code" is used to refer to more elaborate encodings, but the original Frequency Modulation code, also called
differential Manchester encoding
Differential Manchester encoding (DM) is a line code in digital frequency modulation in which data and clock signals are combined to form a single two-level self- synchronizing data stream. In various specific applications, this method is also ca ...
, can be seen as a simple rate-1/2 RLL code.
The added 1 bits are referred to as clock bits.
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: 1010111011111011101010111110
Clock: 1 1 1 1 1 1 1 1 1 1 1 1 1 1
GCR: (0,2) RLL
By extending the maximal run length to 2 adjacent 0 bits, the data rate can be improved to 4/5. This is the original IBM group coded recording variant
Where possible (11 out of 16 codes), the bit pattern
abcd
is encoded by prefixing it with the complement of ''a'':
abcd
. In the 5 cases where this would violate one of the rules (
000d
or
ab00
), a code beginning with 11 is substituted (
11be
, where ''e'' = ''a'' ∨ ''d'').
Example:
Data: 0010 1101 0001 1000
Encoded: 10010011011101111010
Note that to meet the definition of (0,2) RLL, it is not sufficient only that each 5-bit code contain no more than two consecutive zeros, but it is also necessary that any pair of 5-bit codes as a combined sequentially not contain more than two consecutive zeros. That is, there must not be more than two zeros between the last one bit in the first code and the first one bit in the second code, for any two arbitrarily chosen codes. This is required because for any RLL code, the run-length limits 0 and 2 in this case apply to the overall modulated bitstream, not just to the components of it that represent discrete sequences of plain data bits. (This rule must hold for any arbitrary pair of codes, without exception, because the input data may be any arbitrary sequence of bits.) The IBM GCR code above meets this condition, since the maximal run length of zeros at the beginning of any 5-bit code is one, and likewise the maximal run length at the end of any code is one, making a total run length of two at the junction between adjacent codes. (An example of the maximal run length occurring between codes can be seen in the example given above, where the code for the data "0010" ends with a zero and the code for the next data, "1101", begins with a zero, forming a run of two zeros at the junction of these two 5-bit codes.)
MFM: (1,3) RLL
Modified frequency modulation begins to get interesting, because its special properties allow its bits to be written to a magnetic medium with twice the density of an arbitrary bit stream. There is a limit to how close in time flux transitions can be for reading equipment to detect them, and that constrains how closely bits can be recorded on the medium: In the worst case, with an arbitrary bit stream, there are two consecutive ones, which produces two consecutive flux transitions in time, so bits must be spaced far enough apart that there would be sufficient time between those flux transitions for the reader to detect them. But this code imposes a constraint of ''d'' = 1, i.e. there is a minimum of one zero between each two ones. This means that in the worst case, flux transitions are two bit times apart, so the bits can be twice as close together as with the arbitrary bit stream without exceeding the reader's capabilities.
This doubled recording density compensates for the 1/2 coding rate of this code (it takes two bits to represent one bit of real information) and makes it equivalent to a rate-1 code.
Here "x" is the complement of the previous encoded bit (which is also the previous data bit). Except for the clock bits that "x" bit, and the "0" in the "01" code this is the same as the FM table, and that is how this code gets its name. The inserted clock bits are 0 except between two 0 data bits.
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: x010010001010001001010010100
Clock: x 1 0 0 0 0 0 0 0 1 1 0 0 0
(1,7) RLL
(1,7) RLL maps 2 bits of data onto 3 bits on the disk, and the encoding is done in 2- or 4-bit groups. The encoding rules are: (''x'', ''y'') becomes (NOT ''x'', ''x'' AND ''y'', NOT ''y''), except (''x'', 0, 0, ''y'') becomes (NOT ''x'', ''x'' AND ''y'', NOT ''y'', 0, 0, 0).
When encoding according to the table below, the ''longest'' (last in the table) match must be used; those are exceptions handling situations where applying the earlier rules would lead to a violation of the code constraints.
Example:
Data: 0 0 1 0 1 1 0 1 0 0 0 1 1 0
Encoded: 101 001 010 100 100 000 001
(2,7) RLL
(2,7) RLL is rate- code, mapping ''n'' bits of data onto 2''n'' bits on the disk, like MFM, but because the minimal run length is 50% longer (3 bit times instead of 2), the bits can be written faster, achieving 50% higher effective data density. The encoding is done in 2-, 3- or 4-bit groups.
Western Digital WD5010A, WD5011A, WD50C12
Seagate ST11R, IBM
Perstor Systems ADRC
The encoded forms begin with at most 4, and end with at most 3 zero bits, giving the maximal run length of 7.
Example:
Data: 1 1 0 1 1 0 0 1 1
Encoded: 1000 001000 00001000
HHH(1,13)
The HHH(1,13) code is a rate-2/3 code developed by three IBM researchers (Hirt, Hassner, and Heise) for use in the 16 MB/s
IrDA
The Infrared Data Association (IrDA) is an industry-driven interest group that was founded in 1994 by around 50 companies. IrDA provides specifications for a complete set of protocols for wireless infrared communications, and the name "IrDA" also ...
VFIR physical layer.
[.] Unlike magnetic encoding, this is designed for an infrared transmitter, where a 0 bit represents "off" and a 1 bit represents "on". Because 1 bits consume more power to transmit, this is designed to limit the density of 1 bits to less than 50%. In particular, it is a (1,13, 5) RLL code, where the final 5 indicates the additional constraint that there are at most 5 consecutive "10" bit pairs.
The first eight rows describe a standard (1,7)-RLL code. The additional six exceptions increase the maximal run of zeros to 13 (in the legal pattern 100 000 000 000 001, which represents 10 11 10 11, followed by 01), but limit the maximal average ones density to . The longest run of 1–0 pairs is 000 101 010 101 000.
This code limits the ones density to between and , with an average of 25.8%.
Examples
For example, let us encode the bit sequence 10110010 with different encodings
Densities
Suppose a magnetic tape can contain up to 3200 flux reversals per inch. A modified frequency modulation, or (1,3) RLL encoding, stores each data bit as two bits on tape, but since there is guaranteed to be one 0 (no flux reversal) bit between any 1 (flux reversal) bits, then it is possible to store 6400 encoded bits per inch on the tape, or 3200 data bits per inch. A (1,7) RLL encoding can also store 6400 encoded bits per inch on the tape, but since it only takes 3 encoded bits to store 2 data bits, this is 4267 data bits per inch. A (2,7) RLL encoding takes 2 encoded bits to store each data bit, but since there is guaranteed to be two 0 bits between any 1 bits, then it is possible to store 9600 encoded bits per inch on the tape, or 4800 data bits per inch.
The flux-reversal densities on hard drives are significantly greater, but the same improvements in storage density are seen by using different encoding systems.
See also
*
8b/10b encoding
In telecommunications, 8b/10b is a line code that maps 8-bit words to 10-bit symbols to achieve DC balance and bounded disparity, and at the same time provide enough state changes to allow reasonable clock recovery. This means that the diff ...
*
Bit slip
In digital transmission, bit slip is the loss or gain of a bit or bits, caused by clock drift – variations in the respective clock rates of the transmitting and receiving devices.
One cause of bit slippage is overflow of a receive buffer that ...
*
Eight-to-fourteen modulation
Eight-to-fourteen modulation (EFM) is a data encoding technique – formally, a ''line code'' – used by compact discs (CD), laserdiscs (LD) and pre-Hi-MD MiniDiscs. EFMPlus is a related code, used in DVDs and Super Audio CDs (SACDs).
EFM and ...
and EFMplus are DC-free (2,10) RLL codes used on CDs and DVDs, respectively.
*
Error correcting codes
In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for controlling errors in data over unreliable or noisy communication channels. The central idea is ...
*
Line code
In telecommunication, a line code is a pattern of voltage, current, or photons used to represent digital data transmitted down a communication channel or written to a storage medium. This repertoire of signals is usually called a constrained c ...
*
Modulation
In electronics and telecommunications, modulation is the process of varying one or more properties of a periodic waveform, called the ''carrier signal'', with a separate signal called the ''modulation signal'' that typically contains informatio ...
*
Physical layer
In the seven-layer OSI model of computer networking, the physical layer or layer 1 is the first and lowest layer; The layer most closely associated with the physical connection between devices. This layer may be implemented by a PHY chip.
The ...
*
PRML
In computer data storage, partial-response maximum-likelihood (PRML) is a method for recovering the digital data from the weak analog read-back signal picked up by the head of a magnetic disk drive or tape drive. PRML was introduced to recover dat ...
*
Run-length encoding
Run-length encoding (RLE) is a form of lossless data compression in which ''runs'' of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original ...
*
Self-synchronizing code
In coding theory, especially in telecommunications, a self-synchronizing code is a uniquely decodable code in which the symbol stream formed by a portion of one code word, or by the overlapped portion of any two adjacent code words, is not a val ...
and bit synchronization
*
Source coding
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
References
External links
Digital Magnetic Tape Recording
{{DEFAULTSORT:Run-Length Limited
Audio storage
Rotating disc computer storage media
Video storage
Line codes
Physical layer protocols