HOME

TheInfoList



OR:

Parchive (a
portmanteau A portmanteau word, or portmanteau (, ) is a blend of wordserasure code In coding theory, an erasure code is a forward error correction (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of ''k'' symbols into a longer message (code word) with ''n'' symbols such that th ...
system that produces par files for
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
verification of
data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
, with the capability to perform
data recovery In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The dat ...
operations that can repair or regenerate corrupted or missing data. Parchive was originally written to solve the problem of reliable file sharing on
Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it wa ...
, but it can be used for protecting any kind of data from
data corruption In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
,
disc rot Disc rot is the tendency of CD, DVD, or other optical discs to become unreadable because of physical or chemical deterioration. The causes include oxidation of the reflective layer, physical scuffing and abrasion of disc, reactions with contamina ...
, bit rot, and accidental or malicious damage. Despite the name, Parchive uses more advanced techniques (specifically error correction codes) than simplistic parity methods of
error detection In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
. As of 2014, PAR1 is obsolete, PAR2 is mature for widespread use, and PAR3 is a discontinued experimental version developed by MultiPar author Yutaka Sawada. The original SourceForge Parchive project has been inactive since April 30, 2015. A new PAR3 specification has been worked on since April 28, 2019 by PAR2 specification author Michael Nahas. An alpha version of the PAR3 specification has been published on January 29, 2022 while the program itself is being developed.


History

Parchive was intended to increase the reliability of transferring files via Usenet newsgroups. Usenet was originally designed for informal conversations, and the underlying protocol,
NNTP The Network News Transfer Protocol (NNTP) is an application protocol used for transporting Usenet news articles (''netnews'') between news servers, and for reading/posting articles by the end user client applications. Brian Kantor of the Univers ...
was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
text. Various techniques were devised to send files over Usenet, such as
uuencoding uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems. The name "uuencoding" is der ...
and
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
. Later Usenet software allowed 8 bit Extended ASCII, which permitted new techniques like
yEnc yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit encoding method. yEnc's overhead is often (if ea ...
. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained. With the introduction of Parchive, parity files could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (*.par in version 1 and *.par2 in version 2) that do not contain any recovery data. These indexes contain file hashes that can be used to quickly identify the target files and verify their integrity. Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available). In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001. Par1 used Reed–Solomon error correction to create new recovery files. Any of the recovery files can be used to rebuild a missing file from an incomplete
download In computer networks, download means to ''receive'' data from a remote system, typically a server such as a web server, an FTP server, an email server, or other similar system. This contrasts with uploading, where data is ''sent to'' a remote ...
. Version 1 became widely used on Usenet, but it did suffer some limitations: * It was restricted to handle at most 255 files. * The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.) * The recovery algorithm had a bug, due to a flaw in the academic paper on which it was based. * It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience. In January 2002, Howard Fukada proposed that a new Par2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002. Peter Clements then went on to write the first two Par2 implementations,
QuickPar QuickPar is a computer program that creates parchives used as verification and recovery information for a file or group of files, and uses the recovery information, if available, to attempt to reconstruct the originals from the damaged files and ...
and par2cmdline. Abandoned since 2004, Paul Houle created phpar2 to supersede par2cmdline. Yutaka Sawada created MultiPar to supersede QuickPar. MultiPar uses par2j.exe (which is partially based on par2cmdline's optimization techniques) to use as MultiPar's backend engine.


Versions

Versions 1 and 2 of the file format are incompatible. (However, many clients support both.)


Par1

For Par1, the files ''f1'', ''f2'', ..., ''fn'', the Parchive consists of an index file (''f.par''), which is CRC type file with no recovery blocks, and a number of "parity volumes" (''f.p01'', ''f.p02'', etc.). Given all of the original files except for one (for example, ''f2''), it is possible to create the missing ''f2'' given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth. Par1 supports up to a total of 256 source and recovery files.


Par2

Par2 files generally use this naming/extension system: ''filename.vol000+01.PAR2'', ''filename.vol001+02.PAR2'', ''filename.vol003+04.PAR2'', ''filename.vol007+06.PAR2'', etc. The number after the ''"+"'' in the filename indicates how many blocks it contains, and the number after ''"vol"'' indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading ''filename.vol003+04.PAR2''. However, due to the redundancy, ''filename.vol007+06.PAR2'' is also acceptable. There is also an index file ''filename.PAR2'', it is identical in function to the small index file used in PAR1. Par2 specification supports up to 32,768 source blocks and up to 65,535 recovery blocks. Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file. Although
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
is mentioned in the PAR2 specification as an option, most PAR2 implementations do not support Unicode. Directory support is included in the PAR2 specification, but most or all implementations do not support it.


Par3

The Par3 specification was originally planned to be published as an enhancement over the Par2 specification. However, to date, it has remained closed source by specification owner Yutaka Sawada. A discussion on a new format started in the GitHub issue section of the maintained fork par2cmdline on January 29, 2019. The discussion led to a new format which is also named as Par3. The new Par3 format's specification i
published on GitHub
but remains being an alpha draft as of January 28, 2022. The specification is written by Michael Nahas, the author of Par2 specification, with the help from Yutaka Sawada, animetosho and malaire. The new format claims to have multiple advantages over the Par2 format, including: * Supports more than 216 files and more than 216 blocks. * Supports packing small files into one block, as well as deduplication when a block appears in multiple files. * Supports
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
file names, file permissions, hard links and soft links. * Supports embedding PAR data inside other formats, like ZIP archives or ISO disk images. * Supports "incremental backups", where a user creates recovery files for some file or folder, change some data, and create new recovery files reusing some of the older files. * Supports more error correction code algorithms (such as
LDPC In information theory, a low-density parity-check (LDPC) code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel. An LDPC code is constructed using a sparse Tanner graph (subclass of the bip ...
and sparse random matrix). * Replaced the MD5 hash function in Par2 with BLAKE3. * Supports empty directories. * Supports file permissions. * Supports hard links and symbolic links.


Software


Multi-Platform


par2+tbb
(
GPLv2 The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
) — a concurrent (multithreaded) version of par2cmdline 0.4 using TBB. Only compatible with
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
based CPUs. It is available in the
FreeBSD Ports The FreeBSD Ports collection is a package management system for the FreeBSD operating system, providing an easy and consistent way of installing software packages. As of February 2020, there are over 38,487 ports available in the collection. It has ...
system a
par2cmdline-tbb

Original par2cmdline
nbsp;— (obsolete). Available in the
FreeBSD Ports The FreeBSD Ports collection is a package management system for the FreeBSD operating system, providing an easy and consistent way of installing software packages. As of February 2020, there are over 38,487 ports available in the collection. It has ...
system a
par2cmdline

par2cmdline
maintained fork by BlackIkeEagle.
par2cmdline-mt
is another multithreaded version of par2cmdline using
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syst ...
,
GPLv2 The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
, or later. Currently merged into BlackIkeEagle's fork and maintained there.
ParPar
(
CC0 A Creative Commons (CC) license is one of several public copyright licenses that enable the free distribution of an otherwise copyrighted "work".A "work" is any creative material made by a person. A painting, a graphic, a book, a song/lyric ...
) is a high performance, multithreaded PAR2 client and Node.js library. Does not support verifying or repair, it can currently only create PAR2 archives.
par2deep
( LGPL-3.0) — Produce, verify and repair par2 files recursively, both on the command line as well as with the aid of a graphical user interface. It is available in the
Python Package Index The Python Package Index, abbreviated as PyPI () and also known as the Cheese Shop (a reference to the ''Monty Python's Flying Circus'' sketch " Cheese Shop"), is the official third-party software repository for Python. It is analogous to the C ...
system a
par2deep


Windows


MultiPar
(freeware)  — Builds upon QuickPar's features and GUI, and uses Yutaka Sawada's par2j.exe as the PAR2 backend. MultiPar supports multiple languages by Unicode. The name of MultiPar was derived from "multi-lingual PAR client". MultiPar is also verified to work with
Wine Wine is an alcoholic drink typically made from fermented grapes. Yeast consumes the sugar in the grapes and converts it to ethanol and carbon dioxide, releasing heat in the process. Different varieties of grapes and strains of yeasts are m ...
under
TrueOS TrueOS (formerly PC-BSD or PCBSD) is a discontinued Unix-like, server-oriented operating system built upon the most recent releases of FreeBSD-CURRENT. Up to 2018 it aimed to be easy to install by using a graphical installation program, and ea ...
and
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: '' Desktop'', ''Server'', and ''Core'' for Internet of things devices and robots. All ...
, and may work with other operating systems too. Although the Par2 components are (or will be) open source, the MultiPar GUI on top of them is currently not open source. *
QuickPar QuickPar is a computer program that creates parchives used as verification and recovery information for a file or group of files, and uses the recovery information, if available, to attempt to reconstruct the originals from the damaged files and ...
(freeware) — unmaintained since 2004, superseded by MultiPar.
phpar2
 — advanced par2cmdline with multithreading and highly optimized assemblercode (about 66% faster than QuickPar 0.9.1)

nbsp;— First PAR implementation, unmaintained since 2001.


Mac OS X




UnRarX


POSIX

Software for
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inter ...
conforming operating systems:
Par2 for KDE 4

PyPar2 1.4
a frontend for par2.
GPar2 2.03


See also

* Comparison of file archivers – Some file archivers are capable of integrating parity data into their formats for error detection and correction: *
RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
– RAID levels at and above RAID 5 make use of parity data to detect and repair errors.


References

{{Reflist, 30em


External links


Parchive project - full specifications and math behind it



Slyck's Guide To The Usenet Newsgroups: PAR & PAR2 Files

Guide to repair files using PAR2

UsenetReviewz's guide to opening par files
Archive formats Data management Usenet