ZIP is an
archive file format
In computing, an archive file is a computer file that is composed of one or more files along with metadata. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compres ...
that supports
lossless data compression
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression
algorithms
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing c ...
, though
DEFLATE is the most common. This format was originally created in 1989 and was first implemented in
PKWARE, Inc.
PKWARE, Inc. is an enterprise data protection software company that provides discovery, classification, masking and encryption solutions, along with data compression software, used by organizations in financial services, manufacturing, militar ...
's
PKZIP
PKZIP is a file archiving computer program, notable for introducing the popular ZIP file format. PKZIP was first introduced for MS-DOS on the IBM-PC compatible platform in 1989. Since then versions have been released for a number of other ...
utility, as a replacement for the previous
ARC
ARC may refer to:
Business
* Aircraft Radio Corporation, a major avionics manufacturer from the 1920s to the '50s
* Airlines Reporting Corporation, an airline-owned company that provides ticket distribution, reporting, and settlement services
* ...
compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support (under the name "compressed folders") in versions of
Microsoft Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
since 1998 via the "Windows Plus!" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in
Mac OS X
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
10.3 (via BOMArchiveHelper, now
Archive Utility
This is a list of macOS built-in apps and system components.
Applications
App Store
The Mac App Store is macOS's digital distribution platform for macOS apps, created and maintained by Apple Inc. based on the iOS version, the platform was an ...
) and later. Most
free operating systems have built in support for ZIP in similar manners to Windows and Mac OS X.
ZIP files generally use the
file extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
s or and the
MIME
Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
media type .
[ ZIP is used as a base file format by many programs, usually under a different name. When navigating a file system via a user interface, graphical ]icons
An icon () is a religious work of art, most commonly a painting, in the cultures of the Eastern Orthodox, Oriental Orthodox, and Catholic churches. They are not simply artworks; "an icon is a sacred image used in religious devotion". The most ...
representing ZIP files often appear as a document or other object prominently featuring a zipper
A zipper, zip, fly, or zip fastener, formerly known as a clasp locker, is a commonly used device for binding together two edges of textile, fabric or other flexible material. Used in clothing (e.g. jackets and jeans), luggage and other Bag, ba ...
.
History
The file format was designed by Phil Katz
Phillip Walter Katz (November 3, 1962 – April 14, 2000) was a computer programmer best known as the co-creator of the Zip file format for data compression, and the author of PKZIP, a program for creating zip files that ran under DOS. A c ...
of PKWARE
PKWARE, Inc. is an enterprise data protection software company that provides discovery, classification, masking and encryption solutions, along with data compression software, used by organizations in financial services, manufacturing, militar ...
and Gary Conway of Infinity Design Concepts. The format was created after Systems Enhancement Associates (SEA) filed a lawsuit
-
A lawsuit is a proceeding by a party or parties against another in the civil court of law. The archaic term "suit in law" is found in only a small number of laws still in effect today. The term "lawsuit" is used in reference to a civil actio ...
against PKWARE claiming that the latter's archiving products, named PKARC, were derivatives of SEA's ARC
ARC may refer to:
Business
* Aircraft Radio Corporation, a major avionics manufacturer from the 1920s to the '50s
* Airlines Reporting Corporation, an airline-owned company that provides ticket distribution, reporting, and settlement services
* ...
archiving system. The name "zip" (meaning "move at high speed") was suggested by Katz's friend, Robert Mahoney. They wanted to imply that their product would be faster than ARC
ARC may refer to:
Business
* Aircraft Radio Corporation, a major avionics manufacturer from the 1920s to the '50s
* Airlines Reporting Corporation, an airline-owned company that provides ticket distribution, reporting, and settlement services
* ...
and other compression formats of the time. By distributing the zip file format within APPNOTE.TXT, compatibility with the zip file format proliferated widely on the public Internet during the 1990s.
PKWARE and Infinity Design Concepts made a joint press release on February 14, 1989, releasing the file format into the public domain
The public domain (PD) consists of all the creative work
A creative work is a manifestation of creative effort including fine artwork (sculpture, paintings, drawing, sketching, performance art), dance, writing (literature), filmmaking, ...
.
Version history
The .ZIP File Format Specification has its own version number, which does not necessarily correspond to the version numbers for the PKZIP tool, especially with PKZIP 6 or later. At various times, PKWARE has added preliminary features that allow PKZIP products to extract archives using advanced features, but PKZIP products that create such archives are not made available until the next major release. Other companies or organizations support the PKWARE specifications at their own pace.
The .ZIP file format specification is formally named "APPNOTE - .ZIP File Format Specification" and it is published on the PKWARE.com website since the late 1990s. Several versions of the specification were not published. Specifications of some features such as BZIP2 compression, strong encryption specification and others were published by PKWARE a few years after their creation. The URL of the online specification was changed several times on the PKWARE website.
A summary of key advances in various versions of the PKWARE specification:
* 2.0: (1993)[ File entries can be compressed with DEFLATE and use traditional PKWARE encryption (ZipCrypto).
* 2.1: (1996) Deflate64 compression
* 4.5: (2001)] Documented 64-bit zip format.
* 4.6: (2001) BZIP2 compression (not published online until the publication of APPNOTE 5.2)
* 5.0: (2002) SES: DES
Des is a masculine given name, mostly a short form (hypocorism) of Desmond. People named Des include:
People
* Des Buckingham, English football manager
* Des Corcoran, (1928–2004), Australian politician
* Des Dillon (disambiguation), sever ...
, Triple DES
In cryptography, Triple DES (3DES or TDES), officially the Triple Data Encryption Algorithm (TDEA or Triple DEA), is a symmetric-key block cipher, which applies the DES cipher algorithm three times to each data block. The Data Encryption Standa ...
, RC2
In cryptography, RC2 (also known as ARC2) is a symmetric-key block cipher designed by Ron Rivest in 1987. "RC" stands for "Ron's Code" or "Rivest Cipher"; other ciphers designed by Rivest include RC4, RC5, and RC6.
The development of RC2 was ...
, RC4
In cryptography, RC4 (Rivest Cipher 4, also known as ARC4 or ARCFOUR, meaning Alleged RC4, see below) is a stream cipher. While it is remarkable for its simplicity and speed in software, multiple vulnerabilities have been discovered in RC4, ren ...
supported for encryption (not published online until the publication of APPNOTE 5.2)
* 5.2: (2003) AES encryption support for SES (defined in APPNOTE 5.1 that was not published online) and AES from WinZip ("AE-x"); corrected version of RC2-64 supported for SES encryption.
* 6.1: (2004) Documented certificate storage.
* 6.2.0: (2004) Documented Central Directory Encryption.
* 6.3.0: (2006) Documented Unicode (UTF-8
UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
) filename storage. Expanded list of supported compression algorithms ( LZMA, PPMd+), encryption algorithms (Blowfish
Tetraodontidae is a family of primarily marine and estuarine fish of the order Tetraodontiformes. The family includes many familiar species variously called pufferfish, puffers, balloonfish, blowfish, blowies, bubblefish, globefish, swellfis ...
, Twofish
In cryptography, Twofish is a symmetric key block cipher with a block size of 128 bits and key sizes up to 256 bits. It was one of the five finalists of the Advanced Encryption Standard contest, but it was not selected for standardization. Twof ...
), and hashes.
* 6.3.1: (2007) Corrected standard hash values for SHA-256/384/512.
* 6.3.2: (2007) Documented compression method 97 (WavPack
WavPack is a free and open-source lossless audio compression format and application implementing the format. It is unique in the way that it supports hybrid audio compression alongside normal compression which is similar to how FLAC works. I ...
).
* 6.3.3: (2012) Document formatting changes to facilitate referencing the PKWARE Application Note from other standards using methods such as the JTC 1 Referencing Explanatory Report (RER) as directed by JTC 1/SC 34 N 1621.
* 6.3.4: (2014) Updates the PKWARE, Inc. office address.
* 6.3.5: (2018) Documented compression methods 16, 96 and 99, DOS timestamp epoch and precision, added extra fields for keys and decryption, as well as typos and clarifications.
* 6.3.6: (2019) Corrected typographical error.
* 6.3.7: (2020) Added Zstandard
Zstandard, commonly known by the name of its reference implementation zstd, is a lossless data compression algorithm developed by Yann Collet at Facebook.
''Zstd'' is the reference implementation in C. Version 1 of this implementation was re ...
compression method ID 20.
* 6.3.8: (2020) Moved Zstandard compression method ID from 20 to 93, deprecating the former. Documented method IDs 94 and 95 (MP3
MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
and XZ respectively).
* 6.3.9: (2020) Corrected a typo in Data Stream Alignment description.
* 6.3.10: (2022) Added several z/OS attribute values for APPENDIX B. Added several additional 3rd party Extra Field mappings.
WinZip
WinZip is a trialware file archiver and data compression, compressor for Microsoft Windows, macOS, iOS and Android (operating system), Android. It is developed by WinZip Computing (formerly Nico Mak Computing), which is owned by Corel, Corel Co ...
, starting with version 12.1, uses the extension for ZIP files that use compression methods newer than DEFLATE; specifically, methods BZip, LZMA, PPMd, Jpeg and Wavpack. The last 2 are applied to appropriate file types when "Best method" compression is selected.
Standardization
In April 2010, ISO/IEC JTC 1
ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
initiated a ballot to determine whether a project should be initiated to create an ISO/IEC International Standard format compatible with ZIP. The proposed project, entitled ''Document Packaging'', envisaged a ZIP-compatible 'minimal compressed archive format' suitable for use with a number of existing standards including OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed wi ...
, Office Open XML
Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version a ...
and EPUB
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartphones ...
.
In 2015, ISO/IEC 21320-1 "Document Container File — Part 1: Core" was published which states that "Document container files are conforming Zip files". It requires the following main restrictions of the ZIP file format:
* Files in ZIP archives may only be stored uncompressed, or using the "deflate" compression (i.e. compression method may contain the value "0" - stored or "8" - deflated).
* The encryption features are prohibited.
* The digital signature features (from SES) are prohibited.
* The "patched data" features (from PKPatchMaker) are prohibited.
* Archives may not span multiple volumes or be segmented.
Design
files are archives that store multiple files. ZIP allows contained files to be compressed using many different methods, as well as simply storing a file without compressing it. Each file is stored separately, allowing different files in the same archive to be compressed using different methods. Because the files in a ZIP archive are compressed individually, it is possible to extract them, or add new ones, without applying compression or decompression to the entire archive. This contrasts with the format of compressed tar
Tar is a dark brown or black viscous liquid of hydrocarbons and free carbon, obtained from a wide variety of organic materials through destructive distillation. Tar can be produced from coal, wood, petroleum, or peat. "a dark brown or black bit ...
files, for which such random-access processing is not easily possible.
A directory is placed at the end of a ZIP file. This identifies what files are in the ZIP and identifies where in the ZIP that file is located. This allows ZIP readers to load the list of files without reading the entire ZIP archive. ZIP archives can also include extra data that is not related to the ZIP archive. This allows for a ZIP archive to be made into a self-extracting archive (application that decompresses its contained data), by prepending the program code to a ZIP archive and marking the file as executable. Storing the catalog at the end also makes possible hiding a zipped file by appending it to an innocuous file, such as a GIF image file.
The format uses a 32-bit CRC algorithm and includes two copies of the directory structure of the archive to provide greater protection against data loss.
Structure
A ZIP file is correctly identified by the presence of an ''end of central directory record'' which is located at the end of the archive structure in order to allow the easy appending of new files. If the end of central directory record indicates a non-empty archive, the name of each file or directory within the archive should be specified in a ''central directory'' entry, along with other metadata about the entry, and an offset into the ZIP file, pointing to the actual entry data. This allows a file listing of the archive to be performed relatively quickly, as the entire archive does not have to be read to see the list of files. The entries within the ZIP file also include this information, for redundancy, in a ''local file header''. Because ZIP files may be appended to, only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives), as the central directory may declare that some files have been deleted and other files have been updated.
For example, we may start with a ZIP file that contains files A, B and C. File B is then deleted and C updated. This may be achieved by just appending a new file C to the end of the original ZIP file and adding a new central directory that only lists file A and the new file C. When ZIP was first designed, transferring files by floppy disk was common, yet writing to disks was very time-consuming. If you had a large zip file, possibly spanning multiple disks, and only needed to update a few files, rather than reading and re-writing all the files, it would be substantially faster to just read the old central directory, append the new files then append an updated central directory.
The order of the file entries in the central directory need not coincide with the order of file entries in the archive.
Each entry stored in a ZIP archive is introduced by a ''local file header'' with information about the file such as the comment, file size and file name, followed by optional "extra" data fields, and then the possibly compressed, possibly encrypted file data. The "Extra" data fields are the key to the extensibility of the ZIP format. "Extra" fields are exploited to support the ZIP64 format, WinZip-compatible AES encryption, file attributes, and higher-resolution NTFS or Unix file timestamps. Other extensions are possible via the "Extra" field. ZIP tools are required by the specification to ignore Extra fields they do not recognize.
The ZIP format uses specific 4-byte "signatures" to denote the various structures in the file. Each file entry is marked by a specific signature. The end of central directory record is indicated with its specific signature, and each entry in the central directory starts with the 4-byte ''central file header signature''.
There is no BOF or EOF marker in the ZIP specification. Conventionally the first thing in a ZIP file is a ZIP entry, which can be identified easily by its ''local file header signature''. However, this is not necessarily the case, as this not required by the ZIP specification - most notably, a self-extracting archive will begin with an executable file header.
Tools that correctly read ZIP archives must scan for the end of central directory record signature, and then, as appropriate, the other, indicated, central directory records. They must not scan for entries from the top of the ZIP file, because (as previously mentioned in this section) only the central directory specifies where a file chunk starts and that it has not been deleted. Scanning could lead to false positives, as the format does not forbid other data to be between chunks, nor file data streams from containing such signatures. However, tools that attempt to recover data from damaged ZIP archives will most likely scan the archive for local file header signatures; this is made more difficult by the fact that the compressed size of a file chunk may be stored after the file chunk, making sequential processing difficult.
Most of the signatures end with the short integer 0x4b50, which is stored in little-endian
In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
ordering. Viewed as an ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
string this reads "PK", the initials of the inventor Phil Katz. Thus, when a ZIP file is viewed in a text editor the first two bytes of the file are usually "PK". (DOS, OS/2 and Windows self-extracting ZIPs have an EXE
Exe or EXE may refer to:
* .exe, a file extension
* exe., abbreviation for executive
Places
* River Exe, in England
* Exe Estuary, in England
* Exe Island, in Exeter, England
Transportation and vehicles
* Exe (locomotive), a British locomotive
...
before the ZIP so start with "MZ"; self-extracting ZIPs for other operating systems may similarly be preceded by executable code for extracting the archive's content on that platform.)
The specification also supports spreading archives across multiple file-system files. Originally intended for storage of large ZIP files across multiple floppy disk
A floppy disk or floppy diskette (casually referred to as a floppy, or a diskette) is an obsolescent type of disk storage composed of a thin and flexible disk of a magnetic storage medium in a square or nearly square plastic enclosure lined w ...
s, this feature is now used for sending ZIP archives in parts over email, or over other transports or removable media.
The FAT filesystem
File Allocation Table (FAT) is a file system developed for personal computers. Originally developed in 1977 for use on floppy disks, it was adapted for use on hard disks and other devices. It is often supported for compatibility reasons by ...
of DOS has a timestamp resolution of only two seconds; ZIP file records mimic this. As a result, the built-in timestamp resolution of files in a ZIP archive is only two seconds, though extra fields can be used to store more precise timestamps. The ZIP format has no notion of time zone
A time zone is an area which observes a uniform standard time for legal, Commerce, commercial and social purposes. Time zones tend to follow the boundaries between Country, countries and their Administrative division, subdivisions instead of ...
, so timestamps are only meaningful if it is known what time zone they were created in.
In September 2006, PKWARE released a revision of the ZIP specification providing for the storage of file names using UTF-8
UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
, finally adding Unicode compatibility to ZIP.
File headers
All multi-byte values in the header are stored in little-endian
In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
byte order. All length fields count the length in bytes.
Local file header
The extra field contains a variety of optional data such as OS-specific attributes. It is divided into records, each with at minimum a 16-bit signature and a 16-bit length. A ZIP64 local file extra field record, for example, has the signature 0x0001 and a length of 16 bytes (or more) so that two 64-bit values (the uncompressed and compressed sizes) may follow. Another common local file extension is 0x5455 (or "UT") which contains 32-bit UTC UNIX timestamps.
This is immediately followed by the compressed data.
Data descriptor
If the bit at offset 3 (0x08) of the general-purpose flags field is set, then the CRC-32 and file sizes are not known when the header is written. If the archive is in Zip64 format, the compressed and uncompressed size fields are 8 bytes long instead of 4 bytes long (see section 4.3.9.2). The equivalent fields in the local header (or in the Zip64 extended information extra field in the case of archives in Zip64 format) are filled with zero, and the CRC-32 and size are appended in a 12-byte structure (optionally preceded by a 4-byte signature) immediately after the compressed data:
Central directory file header
The central directory entry is an expanded form of the local header:
End of central directory record (EOCD)
After all the central directory entries comes the end of central directory (EOCD) record, which marks the end of the ZIP file:
This ordering allows a ZIP file to be created in one pass, but the central directory is also placed at the end of the file in order to facilitate easy removal of files from multiple-part ''(e.g. "multiple floppy-disk")'' archives, as previously discussed.
Compression methods
The .ZIP File Format Specification documents the following compression methods: Store (no compression), Shrink ( LZW), Reduce (levels 1–4; LZ77 + probabilistic), Implode, Deflate, Deflate64, bzip2, LZMA, WavPack
WavPack is a free and open-source lossless audio compression format and application implementing the format. It is unique in the way that it supports hybrid audio compression alongside normal compression which is similar to how FLAC works. I ...
, PPMd
Prediction by partial matching (PPM) is an adaptive statistical data compression technique based on context modeling and prediction. PPM models use a set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stre ...
, and a LZ77 variant provided by IBM z/OS
z/OS is a 64-bit computing, 64-bit operating system for IBM z/Architecture Mainframe_computer, mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS vers ...
CMPSC instruction. The most commonly used compression method is DEFLATE, which is described in IETF .
Other methods mentioned, but not documented in detail in the specification include: PKWARE DCL Implode (old IBM TERSE), new IBM TERSE, IBM LZ77 z Architecture (PFS), and a JPEG variant. A "Tokenize" method was reserved for a third party, but support was never added.
The word ''Implode'' is overused by PKWARE: the DCL/TERSE Implode is distinct from the old PKZIP Implode, a predecessor to Deflate. The DCL Implode is undocumented partially due to its proprietary nature held by IBM, but Mark Adler
Mark Adler (born 1959) is an American software engineer. He is best known for his work in the field of data compression as the author of the Adler-32 checksum function, and a co-author together with Jean-loup Gailly of the zlib compression librar ...
has nevertheless provided a decompressor called "blast" alongside zlib.
Encryption
ZIP supports a simple password
A password, sometimes called a passcode (for example in Apple devices), is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of ...
-based symmetric encryption
Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of ciphertext. The keys may be identical, or there may be a simple transformation to go between th ...
system generally known as ZipCrypto. It is documented in the ZIP specification, and known to be seriously flawed. In particular, it is vulnerable to known-plaintext attack
The known-plaintext attack (KPA) is an attack model for cryptanalysis where the attacker has access to both the plaintext (called a crib), and its encrypted version (ciphertext). These can be used to reveal further secret information such as secr ...
s, which are in some cases made worse by poor implementations of random-number generator
Random number generation is a process by which, often by means of a random number generator (RNG), a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. This means that the particular outc ...
s. Computers running under native Microsoft Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
without third-party archivers can open, but not create, ZIP files encrypted with ZipCrypto, but cannot extract the contents of files using different encryption.
New features including new compression
Compression may refer to:
Physical science
*Compression (physics), size reduction due to forces
*Compression member, a structural element such as a column
*Compressibility, susceptibility to compression
*Gas compression
*Compression ratio, of a c ...
and encryption
In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can decip ...
(e.g. AES) methods have been documented in the ZIP File Format Specification since version 5.2. A WinZip
WinZip is a trialware file archiver and data compression, compressor for Microsoft Windows, macOS, iOS and Android (operating system), Android. It is developed by WinZip Computing (formerly Nico Mak Computing), which is owned by Corel, Corel Co ...
-developed AES-based open standard ("AE-x" in APPNOTE) is used also by 7-Zip
7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, ...
and Xceed, but some vendors use other formats. PKWARE SecureZIP (SES, proprietary) also supports RC2, RC4, DES, Triple DES encryption methods, Digital Certificate-based encryption and authentication (X.509
In cryptography, X.509 is an International Telecommunication Union (ITU) standard defining the format of public key certificates. X.509 certificates are used in many Internet protocols, including TLS/SSL, which is the basis for HTTPS, the secu ...
), and archive header encryption. It is, however, patented (see ).
File name
A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths.
A filename may (depending on the file system) include:
* name &ndas ...
encryption
In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can decip ...
is introduced in .ZIP File Format Specification 6.2, which encrypts metadata stored in Central Directory portion of an archive, but Local Header sections remain unencrypted. A compliant archiver can falsify the Local Header data when using Central Directory Encryption. As of version 6.2 of the specification, the Compression Method and Compressed Size fields within Local Header are not yet masked.
ZIP64
The original format had a 4 GB (232 bytes) limit on various things (uncompressed size of a file, compressed size of a file, and total size of the archive), as well as a limit of 65,535 (216-1) entries in a ZIP archive. In version 4.5 of the specification (which is not the same as v4.5 of any particular tool), PKWARE introduced the "ZIP64" format extensions to get around these limitations, increasing the limits to 16 EB (264 bytes). In essence, it uses a "normal" central directory entry for a file, followed by an optional "zip64" directory entry, which has the larger fields.
The format of the Local file header (LOC) and Central directory entry (CEN) are the same in ZIP and ZIP64. However, ZIP64 specifies an extra field that may be added to those records at the discretion of the compressor, whose purpose is to store values that do not fit in the classic LOC or CEN records. To signal that the actual values are stored in ZIP64 extra fields, they are set to 0xFFFF or 0xFFFFFFFF in the corresponding LOC or CEN record. If one entry does not fit into the classic LOC or CEN record, only that entry is required to be moved into a ZIP64 extra field. The other entries may stay in the classic record. Therefore, not all entries shown in the following table might be stored in a ZIP64 extra field. However, if they appear, their order must be as shown in the table.
On the other hand, the format of EOCD for ZIP64 is slightly different from the normal ZIP version.
It is also not necessarily the last record in the file. A End of Central Directory Locator follows (an additional 20 bytes at the end).
The File Explorer in Windows XP does not support ZIP64, but the Explorer in Windows Vista and later do. Likewise, some extension libraries support ZIP64, such as DotNetZip, QuaZIP and IO::Compress::Zip in Perl. Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
's built-in zipfile supports it since 2.5 and defaults to it since 3.4. OpenJDK
OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006. The implementation is licensed under the GPL-2.0-only wi ...
's built-in java.util.zip supports ZIP64 from version Java 7
The Java language has undergone several changes since JDK 1.0 as well as numerous additions of classes and packages to the standard library. Since J2SE 1.4, the evolution of the Java language has been governed by the Java Community P ...
. Android Java API support ZIP64 since Android 6.0. Mac OS Sierra's Archive Utility notably does not support ZIP64, and can create corrupt archives when ZIP64 would be required. However, the ditto command shipped with Mac OS will unzip ZIP64 files. More recent versions of Mac OS ship with info-zip's zip and unzip command line tools which do support Zip64: to verify run zip -v and look for "ZIP64_SUPPORT".
Combination with other file formats
The file format allows for a comment containing up to 65,535 (216−1) bytes of data to occur at the end of the file after the central directory. Also, because the central directory specifies the offset of each file in the archive with respect to the start, it is possible for the first file entry to start at an offset other than zero, although some tools, for example gzip
gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and in ...
, will not process archive files that do not start with a file entry at offset zero.
This allows arbitrary data to occur in the file both before and after the ZIP archive data, and for the archive to still be read by a ZIP application. A side-effect of this is that it is possible to author a file that is both a working ZIP archive and another format, provided that the other format tolerates arbitrary data at its end, beginning, or middle. Self-extracting archives
A self-extracting archive (SFX or SEA) is a computer executable program which contains compressed data in an archive file combined with machine-executable program instructions to extract this information on a compatible operating system and w ...
(SFX), of the form supported by WinZip, take advantage of this, in that they are executable () files that conform to the PKZIP AppNote.txt specification, and can be read by compliant zip tools or libraries.
This property of the format, and of the JAR
A jar is a rigid, cylindrical or slightly conical container, typically made of glass, ceramic, or plastic, with a wide mouth or opening that can be closed with a lid, screw cap, lug cap, cork stopper, roll-on cap, crimp-on cap, press-on c ...
format which is a variant of ZIP, can be exploited to hide rogue content (such as harmful Java classes) inside a seemingly harmless file, such as a GIF image uploaded to the web. This so-called GIFAR
Graphics Interchange Format Java Archives (GIFAR) is a term for GIF files combined with the JAR file format. GIFARs could be uploaded to Web sites that allow image uploading, and then run as though they were part of the legitimate code of that sit ...
exploit has been demonstrated as an effective attack against web applications such as Facebook.
Limits
The minimum size of a file is 22 bytes. Such an ''empty zip file'' contains only an End of Central Directory Record (EOCD):
The maximum size for both the archive file and the individual files inside it is 4,294,967,295 bytes (232−1 bytes, or 4 GB minus 1 byte) for standard ZIP. For ZIP64, the maximum size is 18,446,744,073,709,551,615 bytes (264−1 bytes, or 16 EB minus 1 byte).
Proprietary extensions
Extra field
file format includes an extra field facility within file headers, which can be used to store extra data not defined by existing ZIP specifications, and which allow compliant archivers that do not recognize the fields to safely skip them. Header IDs 0–31 are reserved for use by PKWARE. The remaining IDs can be used by third-party vendors for proprietary usage.
Strong encryption controversy
When WinZip
WinZip is a trialware file archiver and data compression, compressor for Microsoft Windows, macOS, iOS and Android (operating system), Android. It is developed by WinZip Computing (formerly Nico Mak Computing), which is owned by Corel, Corel Co ...
9.0 public beta was released in 2003, WinZip introduced its own AES-256
The Advanced Encryption Standard (AES), also known by its original name Rijndael (), is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001.
AES is a variant ...
encryption, using a different file format, along with the documentation for the new specification. The encryption standards themselves were not