Compress
   HOME

TheInfoList



OR:

compress is a
Unix shell A Unix shell is a command-line interpreter or shell that provides a command line user interface for Unix-like operating systems. The shell is both an interactive command language and a scripting language, and is used by the operating syste ...
compression program In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressio ...
based on the LZW compression algorithm. Compared to more modern compression utilities such as
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and i ...
and bzip2, compress performs faster and with less memory usage, at the cost of a significantly lower
compression ratio The compression ratio is the ratio between the volume of the cylinder and combustion chamber in an internal combustion engine at their maximum and minimum values. A fundamental specification for such engines, it is measured two ways: the stati ...
. The uncompress utility will restore files to their original state after they have been compressed using the ''compress'' utility. If no files are specified, the standard input will be uncompressed to the standard output. In the upcoming
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
and Single Unix Specification revision, it is planned that DEFLATE algorithm used in
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and i ...
format be supported in those utilities.


Description of program

Files compressed by ''compress'' are typically given the
extension Extension, extend or extended may refer to: Mathematics Logic or set theory * Axiom of extensionality * Extensible cardinal * Extension (model theory) * Extension (predicate logic), the set of tuples of values that satisfy the predicate * Ext ...
".Z" (modeled after the earlier pack program which used the
extension Extension, extend or extended may refer to: Mathematics Logic or set theory * Axiom of extensionality * Extensible cardinal * Extension (model theory) * Extension (predicate logic), the set of tuples of values that satisfy the predicate * Ext ...
".z"). Most ''
tar Tar is a dark brown or black viscous liquid of hydrocarbons and free carbon, obtained from a wide variety of organic materials through destructive distillation. Tar can be produced from coal, wood, petroleum, or peat. "a dark brown or black bi ...
'' programs will
pipe Pipe(s), PIPE(S) or piping may refer to: Objects * Pipe (fluid conveyance), a hollow cylinder following certain dimension rules ** Piping, the use of pipes in industry * Smoking pipe ** Tobacco pipe * Half-pipe and quarter pipe, semi-circular ...
their data through ''compress'' when given the command line option "-Z". (The ''tar'' program in its own does not compress; it just stores multiple files within one tape archive.) Files can be returned to their original state using ''uncompress''. The usual action of ''uncompress'' is not merely to create an uncompressed copy of the file, but also to restore the timestamp and other attributes of the compressed file. For files produced by ''compress'' on other systems, ''uncompress'' supports 9- to 16-bit compression.


History

The LZW algorithm used in was patented by Sperry Research Center in 1983.
Terry Welch Terry Archer Welch was an American computer scientist. Along with Abraham Lempel and Jacob Ziv, he developed the lossless Lempel–Ziv–Welch (LZW) compression algorithm, which was published in 1984. Education Welch received a B.S., M.S. and Ph. ...
published an IEEE article on the algorithm in 1984, but failed to note that he had applied for a patent on the algorithm. Spencer Thomas of the
University of Utah The University of Utah (U of U, UofU, or simply The U) is a public research university in Salt Lake City, Utah. It is the flagship institution of the Utah System of Higher Education. The university was established in 1850 as the University of De ...
took this article and implemented in 1984, without realizing that a patent was pending on the LZW algorithm. The GIF image format also incorporated LZW compression in this way, and
Unisys Unisys Corporation is an American multinational information technology (IT) services and consulting company headquartered in Blue Bell, Pennsylvania. It provides digital workplace solutions, cloud, applications, and infrastructure solutions, ...
later claimed royalties on implementations of GIF. Joseph M. Orost led the team and worked with Thomas et al. to create the 'final' (4.0) version of and published it as free software to the 'net.sources'
USENET Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it wa ...
group in 1985. was granted in 1985, and this is why could not be used without paying royalties to Sperry Research, which was eventually merged into Unisys. has fallen out of favor in particular user-groups because it makes use of the LZW algorithm, which was covered by a Unisys patent because of this,
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and i ...
and bzip2 increased in popularity on
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
-based operating systems due to their alternative algorithms, along with better file compression. ''compress'' has, however, maintained a presence on
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
and BSD systems and the and commands have also been ported to the
IBM i IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS i ...
operating system. The US LZW patent expired in 2003, so it is now in the public domain in the United States. All patents on the LZW worldwide have also expired (see Graphics Interchange Format#Unisys and LZW patent enforcement). In the up-coming
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
and Single Unix Specification revision, it is planned that DEFLATE algorithm used in
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and i ...
format be supported in those utilities.


Special output format

Output binary consists of bit groups. Each bit group consists of codes with fixed amount of bits (9-16). Each group (except last) should be aligned by amount of bits multiplied by 8 and right padded with zeroes. Last group should be aligned by 8 and padded with zeroes. You can find more information in ncompres
issue
Example: :You want to output ten 9-bit codes, five 10-bit codes and thirteen 11-bit codes. You now have three groups of bits that you want to output: 90 bits, 50 bits and 143 bits. :* First group should then be 90 bits of data + 54 zero bits of padding in order to be aligned to 72 bits (9 bits × 8). :* Second group should then be 50 bits of data + 30 zero bits of padding in order to be aligned to 80 bits (10 bits × 8). :* Third group should then be 143 bits of data + 1 zero bit of padding in order to be aligned to 8 bits (1 byte only, since this is the last group in the output). It is actually a bug. LZW doesn't require any alignment. This bug is a part of original UNIX compress, ncompress, gzip and even windows port. It exists more than 35 years. All ''application/x-compress'' files were created using this bug. So we have to include it in output specification. Some compress implementations write random bits from uninitialized buffer as alignment bits. There is no guarantee that alignment bits will be zeroes. So in terms of 100% compatibility decompressor have to just ignore alignment bit values.


Standardization and availability

compress was standardized in X/Open CAE Specification in 1994, and further in The Open Group Base Specifications, Issue 6 and 7.
Linux Standard Base The Linux Standard Base (LSB) was a joint project by several Linux distributions under the organizational structure of the Linux Foundation to standardize the software system structure, including the Filesystem Hierarchy Standard used in the Li ...
does not requires compress. compress is often not installed by default in Linux distributions, but can be installed from an additional package.ncompress
pkgs.org compress is available for FreeBSD, OpenBSD, MINIX, Solaris and AIX.


See also

*
Data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressio ...
*
Image compression Image compression is a type of data compression applied to digital images, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior re ...
*
List of Unix commands This is a list of Unix commands as specified by IEEE Std 1003.1-2008, which is part of the Single UNIX Specification (SUS). These commands can be found on Unix operating systems and most Unix-like operating systems. List See also * List of G ...
*
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and i ...


References


External links

* * * * *
ncompress
- public domain compress/uncompress implementation for POSIX systems
compress
- original Unix compress (in a compress'd archive)
compress
- original Unix compress executable (gzip'd)
Source Code for compress v4.0
(gzip'd sharchives)
ZIP File containing a Windows port of the compress utility

source code to the current version of fcompress.c from compress

bit groups alignment
- Explanation of bit groups alignment.
lzws
- New library and CLI, implemented without legacy code.
ruby-lzws
- Ruby bindings with streaming support.
compress.com
- official website for file compression. {{Compression Software Implementations Data compression software Unix archivers and compression-related utilities Standard Unix programs Unix SUS2008 utilities IBM i Qshell commands