HOME

TheInfoList



OR:

7z is a compressed
archive file format In computing, an archive file stores the content of one or more files, possibly compressed, with associated metadata such as file name, directory structure, error detection and correction information, commentary, compressed data archives, stor ...
that supports several different
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
,
encryption In Cryptography law, cryptography, encryption (more specifically, Code, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the inf ...
and pre-processing algorithms. The 7z format initially appeared as implemented by the
7-Zip 7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own Archive file, archive forma ...
archiver. The 7-Zip program is publicly available under the terms of the
GNU Lesser General Public License The GNU Lesser General Public License (LGPL) is a free-software license published by the Free Software Foundation (FSF). The license allows developers and companies to use and integrate a software component released under the LGPL into their own ...
. The LZMA SDK 4.62 was placed in the
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 24.09. The 7z file format specification is distributed with 7-Zip's source code since 2015. The specification can be found in plain text format in the "doc" sub-directory of the source code distribution.


Features and enhancements

The 7z format provides the following main features: *
Open Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * ''Open'' (Blues Image album), 1969 * ''Open'' (Gerd Dudek, Buschi Niebergall, and Edward Vesala album), 1979 * ''Open'' (Go ...
, modular architecture that allows any compression, conversion, or encryption method to be stacked. * High
compression ratio The compression ratio is the ratio between the maximum and minimum volume during the compression stage of the power cycle in a piston or Wankel engine. A fundamental specification for such engines, it can be measured in two different ways. Th ...
s (depending on the compression method used). * AES-256 bit
encryption In Cryptography law, cryptography, encryption (more specifically, Code, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the inf ...
. * Zip 2.0 (Legacy) Encryption * Large file support (up to approximately 16
exbibyte The byte is a units of information, unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character (computing), character of text in a computer and for this ...
s, or 264 bytes). *
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
file names. * Support for
solid compression In computing, solid compression is a method for data compression of multiple files, wherein all the uncompressed files are concatenated and treated as a single data block. Such an archive is called a solid archive. It is used natively in the 7z a ...
, where multiple files of similar type are compressed within a single stream, in order to exploit the combined redundancy inherent in similar files. * Compression and encryption of archive headers. * Support for multi-part archives : e.g. xxx.7z.001, xxx.7z.002, ... (see the context menu items ''Split File...'' to create them and ''Combine Files...'' to re-assemble an archive from a set of multi-part component files). * Support for custom codec plugin DLLs. The format's
open architecture Open architecture is a type of computer architecture or software architecture intended to make adding, upgrading, and swapping components with other computers easy. For example, the IBM PC, Amiga 2000 and Apple IIe have an open architecture supp ...
allows additional future compression methods to be added to the standard.


Compression methods

The following compression methods are currently defined: * LZMA – A variation of the
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as Lempel-Ziv 1 (LZ1) and Lempel-Ziv 2 (LZ2) respectively. These two algorithms form the basis ...
algorithm, using a sliding dictionary up to 4 GB in length for duplicate string elimination. The LZ stage is followed by
entropy coding In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...
using a
Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
-based range coder and
binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...
s. * LZMA2 – modified version of LZMA providing better multithreading support and less expansion of incompressible data. *
Bzip2 bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities such as tar for tasks such as handli ...
 – The standard
Burrows–Wheeler transform The Burrows–Wheeler transform (BWT) rearranges a character string into runs of similar characters, in a manner that can be reversed to recover the original string. Since compression techniques such as move-to-front transform and run-length enc ...
algorithm. Bzip2 uses two reversible transformations; BWT, then Move to front with
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
for symbol reduction (the actual compression element). * PPMd – Dmitry Shkarin's 2002 PPMdH (PPMII (Prediction by Partial matching with Information Inheritance) and cPPMII (complicated PPMII)) with small changes: PPMII is an improved version of the 1984 PPM compression algorithm (prediction by partial matching). * DEFLATE – Standard algorithm based on 32 kB
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as Lempel-Ziv 1 (LZ1) and Lempel-Ziv 2 (LZ2) respectively. These two algorithms form the basis ...
and
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
. Deflate is found in several file formats including ZIP,
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
, PNG and
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
. 7-Zip contains a from-scratch DEFLATE encoder that frequently beats the ''de facto'' standard
zlib zlib ( or "zeta-lib", ) is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compre ...
version in compression size, but at the expense of CPU usage. A suite of recompression tools called AdvanceCOMP contains a copy of the DEFLATE encoder from the 7-Zip implementation; these utilities can often be used to further compress the size of existing
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and ...
, ZIP, PNG, or MNG files.


Pre-processing filters

The LZMA SDK comes with the BCJ and BCJ2 preprocessors included, so that later stages are able to achieve greater compression: For
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
, ARM,
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
(PPC), IA-64
Itanium Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly dev ...
, and
ARM Thumb ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Holdings develops the ISAs and lice ...
processors, jump targets are "normalized" before compression by changing relative position into absolute values. For x86, this means that near jumps, calls and conditional jumps (but not short jumps and conditional jumps) are converted from the machine language "jump 1655 bytes backwards" style notation to normalized "jump to address 5554" style notation; all jumps to 5554, perhaps a common subroutine, are thus encoded identically, making them more compressible. * BCJ – Converter for 32-bit x86 executables. Normalises target addresses of near jumps and calls from relative distances to absolute destinations. * BCJ2 – Pre-processor for x86-64 executables. BCJ2 is an improvement on BCJ, adding additional x86 jump/call instruction processing. Near jump, near call, conditional near jump targets are split out and compressed separately in another stream. *
Delta encoding Delta encoding is a way of storing or transmitting data in the form of '' differences'' (deltas) between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding is sometimes called delta comp ...
 – delta filter, basic preprocessor for multimedia data. Similar executable pre-processing technology is included in other software; the RAR compressor features displacement compression for 32-bit x86 executables and IA-64 executables, and the UPX runtime executable file compressor includes support for working with 16-bit values within DOS binary files.


Encryption

The 7z format supports
encryption In Cryptography law, cryptography, encryption (more specifically, Code, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the inf ...
with the AES algorithm with a 256-bit key. The key is generated from a user-supplied
passphrase A passphrase is a sequence of words or other text used to control access to a computer system, program or data. It is similar to a password in usage, but a passphrase is generally longer for added security. Passphrases are often used to control ...
using an algorithm based on the
SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
hash function. The SHA-256 is executed 219 (524288) times, which causes a significant delay on slow PCs before compression or extraction starts. This technique is called
key stretching In cryptography, key stretching techniques are used to make a possibly weak key, typically a password or passphrase, more secure against a brute-force attack by increasing the resources (time and possibly space) it takes to test each possible ke ...
and is used to make a
brute-force search In computer science, brute-force search or exhaustive search, also known as generate and test, is a very general problem-solving technique and algorithmic paradigm that consists of Iteration#Computing, systematically checking all possible candida ...
for the passphrase more difficult. Current GPU-based, and custom hardware attacks limit the effectiveness of this particular method of key stretching,
Colin Percival Colin A. Percival (born 1980) is a Canadian computer scientist and computer security researcher. He completed his undergraduate education at Simon Fraser University and a doctorate at the University of Oxford. While at university he joined the F ...

scrypt
. As presented i
"Stronger Key Derivation via Sequential Memory-Hard Functions"
. presented at BSDCan'09, May 2009.
so it is still important to choose a strong password. The 7z format provides the option to encrypt the filenames of a 7z archive.


Limitations

The 7z format does not store filesystem permissions (such as
UNIX Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
owner/group permissions or
NTFS NT File System (NTFS) (commonly called ''New Technology File System'') is a proprietary journaling file system developed by Microsoft in the 1990s. It was developed to overcome scalability, security and other limitations with File Allocation Tabl ...
ACLs), and hence can be inappropriate for backup/archival purposes. A workaround on UNIX-like systems for this is to convert data to a tar bitstream before compressing with 7z. But GNU tar (common in many UNIX environments) can also compress with the LZMA2 algorithm (" xz") natively, without the use of 7z, using the "-J" switch. The resulting file extension is ".tar.xz" or ".txz" and not ".tar.7z". This method of compression has been adopted with many distributions for packaging, such as Arch, Debian (deb), Fedora (rpm) and Slackware. (The older "lzma" format is less efficient.) On the other hand, it is important to note, that tar does not save the filesystem encoding, which means that tar compressed filenames can become unreadable if decompressed on a different computer. The 7z format does not allow extraction of some "broken files"—that is (for example) if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive—it must wait until all segments are downloaded. The 7z format also lacks recovery records, making it vulnerable to
data degradation Data degradation is the gradual Data corruption, corruption of Data (computing), computer data due to an accumulation of non-critical failures in a data storage device. It is also referred to as data decay, data rot or bit rot. This results in ...
unless used in conjunction with external solutions, like parchives, or within filesystems with robust error-correction. By way of comparison, zip files also lack a recovery feature while the rar format has one.


See also

*
7-Zip 7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own Archive file, archive forma ...
* Comparison of archive formats *
List of archive formats This is a list of file formats used by file archiver, archivers and data compression, compressors used to create Archive file, archive files. Archive formats by purpose Archive formats are used for backups, mobility, and archiving. Many archive ...
*
Open file format An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. An open file format is licensed with a ...


References


Further reading

*


External links

* * {{Archive formats Computer-related introductions in 1999 Archive formats Russian inventions