Zstandard
   HOME

TheInfoList



OR:

Zstandard, commonly known by the name of its
reference implementation In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation o ...
zstd, is a
lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
algorithm developed by Yann Collet at
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
. ''Zstd'' is the reference implementation in C. Version 1 of this implementation was released as
open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
on 31 August 2016.


Features

Zstandard was designed to give a
compression ratio The compression ratio is the ratio between the volume of the cylinder and combustion chamber in an internal combustion engine at their maximum and minimum values. A fundamental specification for such engines, it is measured two ways: the stati ...
comparable to that of the DEFLATE algorithm (developed in 1991 and used in the original ZIP and
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and in ...
programs), but faster, especially for decompression. It is tunable with compression levels ranging from negative 7 (fastest) to 22 (slowest in compression speed, but best compression ratio). The zstd package includes parallel (multi-threaded) implementations of both compression and decompression. Starting from version 1.3.2 (October 2017), zstd optionally implements very long range search and deduplication (, 128 MiB window) similar to
rzip rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding ( Huffman) on 900 kB outpu ...
or
lrzip rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding ( Huffman) on 900 kB outpu ...
. Compression speed can vary by a factor of 20 or more between the fastest and slowest levels, while decompression is uniformly fast, varying by less than 20% between the fastest and slowest levels. Zstandard command-line has an "adaptive" () mode that varies compression level depending on I/O conditions, mainly how fast it can write the output. ''Zstd'' at its maximum compression level gives a compression ratio close to lzma, lzham, and ppmx, and performs better than lza, or bzip2. Zstandard reaches the current
Pareto frontier In multi-objective optimization, the Pareto front (also called Pareto frontier or Pareto curve) is the set of all Pareto efficient solutions. The concept is widely used in engineering. It allows the designer to restrict attention to the set of effi ...
, as it decompresses faster than any other currently available algorithm with similar or better compression ratio. Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a user-provided compression dictionary. It also offers a training mode, able to generate a dictionary from a set of samples. In particular, one dictionary can be loaded to process large sets of files with redundancy between files, but not necessarily within each file, e.g.,
log files In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or just information on current operations. These events may occur in the operating system or in other software. A message or l ...
.


Design

Zstandard combines a dictionary-matching stage (
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations includin ...
) with a large search window and a fast entropy-coding stage. It uses both
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algori ...
(used for entries in the Literals section) and finite-state entropy (FSE) - a fast tabled version of ANS, tANS, used for entries in the Sequences section. Because of the way that FSE carries over state between symbols, decompression involves processing symbols within the Sequences section of each block in reverse order (from last to first).


Usage

The
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
has included Zstandard since November 2017 (version 4.14) as a compression method for the
btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or simply by spelling it out) is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (not to be confused ...
and
squashfs Squashfs is a compressed read-only file system for Linux. Squashfs compresses files, inodes and directories, and supports block sizes from 4 KiB up to 1 MiB for greater compression. Several compression algorithms are supported. Squashfs is al ...
filesystems. In 2017, Allan Jude integrated Zstandard into the
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
kernel, and it was subsequently integrated as a compressor option for core dumps (both user programs and kernel panics). It was also used to create a proof-of-concept
OpenZFS OpenZFS is a free and open-source storage platform licensed under CDDL that encompasses the functionality of traditional filesystems and logical volume manager. It includes protection against data corruption, support for high storage capacitie ...
compression method which was integrated in 2020. The AWS Redshift and
RocksDB RocksDB is a high performance embedded database for Key-value database, key-value data. It is a fork of Google's LevelDB optimized to exploit many multi-core processor, CPU cores, and make efficient use of fast storage, such as solid-state drives ...
databases include support for field compression using Zstandard. In March 2018,
Canonical The adjective canonical is applied in many contexts to mean "according to the canon" the standard, rule or primary source that is accepted as authoritative for the body of knowledge or literature in that context. In mathematics, "canonical example ...
tested the use of zstd as a deb package compression method by default for the
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: ''Desktop'', ''Server'', and ''Core'' for Internet of things devices and robots. All the ...
Linux distribution. Compared with xz compression of deb packages, zstd at level 19 decompresses significantly faster, but at the cost of 6% larger package files. Support was added to Debian (and subsequently, Ubuntu) in April 2018 (in version 1.6~rc1). In 2018 the algorithm was published as , which also defines an associated
media type A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication o ...
"application/zstd",
filename extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
"zst", and HTTP content encoding "zstd".
Arch Linux Arch Linux () is an independently developed, x86-64 general-purpose Linux distribution that strives to provide the latest stable versions of most software by following a Rolling release, rolling-release model. The default installation is a minim ...
added support for zstd as a package compression method in October 2019 with the release of the
pacman originally called ''Puck Man'' in Japan, is a 1980 maze video game, maze action game, action video game developed and released by Namco for Arcade game, arcades. In North America, the game was released by Midway Manufacturing as part of its l ...
 5.2 package manager and in January 2020 switched from xz to zstd for the packages in the official repository. Arch uses zstd -c -T0 --ultra -20 -, the size of all compressed packages combined increased by 0.8% (compared to xz), the decompression speed is 13 times faster, decompression memory increased by 50 MiB when using multiple threads, compression memory increases but scales with the number of threads used. Arch Linux later also switched to zstd as default compression algorithm for mkinitcpio
initial ramdisk In Linux systems, initrd (''initial ramdisk'') is a scheme for loading a temporary root file system into memory, to be used as part of the Linux startup process. initrd and initramfs refer to two different methods of achieving this. Both are comm ...
generator.
Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...
added ZStandard support to
RPM Revolutions per minute (abbreviated rpm, RPM, rev/min, r/min, or with the notation min−1) is a unit of rotational speed or rotational frequency for rotating machines. Standards ISO 80000-3:2019 defines a unit of rotation as the dimensionl ...
in May 2018 (Fedora release 28) and used it for packaging the release in October 2019 (Fedora 31). In Fedora 33, the filesystem is compressed by default with zstd. Full implementation of the algorithm with an option to choose the compression level is used in the .NSZ/.XCZ file formats developed by the homebrew community for the
Nintendo Switch The is a hybrid video game console developed by Nintendo and released worldwide in most regions on March 3, 2017. The console itself is a Tablet computer#Gaming tablet, tablet that can either be docking station, docked for use as a home video ...
hybrid game console. Similarly, it is also one of many supported compression algorithms in the .RVZ
Nintendo Wii The Wii ( ) is a home video game console developed and marketed by Nintendo. It was released on November 19, 2006, in North America and in December 2006 for most other regions of the world. It is Nintendo's fifth major home game console, f ...
and
Nintendo GameCube The is a home video game console developed and released by Nintendo in Japan on September 14, 2001, in North America on November 18, 2001, and in PAL territories in 2002. It is the successor to the Nintendo 64 (1996), and predecessor of the Wii ...
disc image A disk image, in computing, is a computer file containing the contents and structure of a disk volume or of an entire data storage device, such as a hard disk drive, tape drive, floppy disk, optical disc, or USB flash drive. A disk image is usua ...
file format. In 2020, Zstandard was implemented in version 6.3.8 of the zip file format with codec number 93. Previous number 20 of version 6.3.7 was deprecated, so a second file format with zip is available for Zstandard files. New versions of zip programs often support this new feature. 7-Zip ZS, a fork of 7-Zip FM with Zstandard (and other formats) support, is developed by Tino Reichardt. Actual is here support of Zstandard 1.5.0. Modern7z, a Zstandard (and other formats) plugin for 7-Zip FM is developed by Denis Anisimov (TC4shell). p7zip also supports in a new version Zstandard 1.4.9.


License

The reference implementation is licensed under the
BSD license BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...
, published at
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
. Since version 1.0, it had an additional Grant of Patent Rights. From version 1.3.1, this patent grant was dropped and the license was changed to a BSD + GPLv2 dual license."New license"
GitHub "facebook/zstd"


See also

*
Zlib zlib ( or "zeta-lib", ) is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a ...
*
LZFSE LZFSE (Lempel–Ziv Finite State Entropy) is an open source lossless data compression algorithm created by Apple Inc. It was released with a simpler algorithm called LZVN. Overview The name is an acronym for Lempel–Ziv and finite-state entr ...
– a similar algorithm by Apple used since iOS 9 and OS X 10.11 made open source on 1 June 2016 *
LZ4 (compression algorithm) LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed. It belongs to the LZ77 family of byte-oriented compression schemes. Features The LZ4 algorithms aims to provide a good trade-off between spe ...
– a fast member of the LZ77 family


References


External links

* * * *
Smaller and faster data compression with Zstandard
, Yann Collet and Chip Turner, 31 August 2016, Facebook Announcement
The Guardian is using ZStandard instead of zlib
{{Compression Methods Lossless compression algorithms Free data compression software C (programming language) libraries 2016 software Software using the BSD license