LHA (file format)
   HOME

TheInfoList



OR:

LHA or LZH is a
freeware Freeware is software, most often proprietary, that is distributed at no monetary cost to the end user. There is no agreed-upon set of rights, license, or EULA that defines ''freeware'' unambiguously; every publisher defines its own rules for the ...
compression utility and associated file format. It was created in 1988 by , a doctor and originally named LHarc. A complete rewrite of LHarc, tentatively named ''LHx'', was eventually released as ''LH''. It was then renamed to ''LHA'' to avoid conflicting with the then-new
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few o ...
5.0 ("load high") command. The original LHA and its
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...
port, LHA32, are no longer in development because Yoshizaki is busy at work. Although no longer much used in the west, LHA remained popular in
Japan Japan ( ja, 日本, or , and formally , ''Nihonkoku'') is an island country in East Asia. It is situated in the northwest Pacific Ocean, and is bordered on the west by the Sea of Japan, while extending from the Sea of Okhotsk in the n ...
until the 2000s. It was used by
id Software id Software LLC () is an American video game developer based in Richardson, Texas. It was founded on February 1, 1991, by four members of the computer company Softdisk: game programmer, programmers John Carmack and John Romero, game designer T ...
to compress installation files for their earlier games, including ''
Doom Doom is another name for damnation. Doom may also refer to: People * Doom (professional wrestling), the tag team of Ron Simmons and Butch Reed * Daniel Doom (born 1934), Belgian cyclist * Debbie Doom (born 1963), American softball pitcher * ...
'' and '' Quake''. Because some versions of LHA have been distributed with source code under the
permissive license A permissive software license, sometimes also called BSD-like or BSD-style license, is a free-software license which instead of copyleft protections, carries only minimal restrictions on how the software can be used, modified, and redistributed, ...
, LHA has been ported to many operating systems and is still the main archiving format used on the
Amiga Amiga is a family of personal computers introduced by Commodore International, Commodore in 1985. The original model is one of a number of mid-1980s computers with 16- or 32-bit processors, 256 KB or more of RAM, mouse-based GUIs, and sign ...
computer, although it competed with LZX in the mid 1990s. This was due to
Aminet Aminet is the world's largest archive of Amiga-related software and files. Aminet was originally hosted by several universities' FTP sites, and is now available on CD-ROM and on the web. According to Aminet, as of 3 September 2022, it has 83930 pac ...
, the world's largest archive of Amiga-related software and files, standardising on Stefan Boberg's implementation of LHA for the Amiga. Microsoft released the Microsoft Compressed (LZH) Folder Add-on, which was designed for the Japanese version of
Windows XP Windows XP is a major release of Microsoft's Windows NT operating system. It was release to manufacturing, released to manufacturing on August 24, 2001, and later to retail on October 25, 2001. It is a direct upgrade to its predecessors, Wind ...
. The Japanese version of
Windows 7 Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was Software release life cycle#Release to manufacturing (RTM), released to manufacturing on July 22, 2009, and became generally available on October 22, ...
ships with the LZH folder add-on built-in. Users of non-Japanese versions of Windows 7 Enterprise and Ultimate can also install the LZH folder add-on by installing the optional Japanese language pack from
Windows Update Windows Update is a Microsoft service for the Windows 9x and Windows NT families of operating system, which automates downloading and installing Microsoft Windows software updates over the Internet. The service delivers software updates for ...
.


Compression methods

In an LZH archive, the compression method is stored as a five-byte text string, e.g. . These are the third through seventh bytes of the file.


Canonical LZH

LHarc compresses files using an algorithm from Yoshizaki's earlier LZHUF product, which was modified from LZARI developed by , but uses
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
instead of
arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...
. LZARI uses
Lempel–Ziv–Storer–Szymanski Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77, that was created in 1982 by James A. Storer and Thomas Szymanski. LZSS was described in article "Data compression via textual substitution" p ...
with arithmetic coding. ;lh0 :No compression method is applied to the source data. ;lh1 :This method is introduced in LHarc version 1. :It supports 4 
KiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
sliding window A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the data link layer (OSI layer 2) as well as in the Trans ...
, with support of maximum 60 bytes of matching length. Dynamic Huffman encoding is used. ;lh2 :lh1 variant. This method supports 8 KiB sliding window, with support of maximum 256 bytes of matching length. Dynamic Huffman encoding is used. ;lh3 :lh2 variant with Static Huffman. ;lh4, lh5, lh6, lh7 :Methods 4, 5, 6, 7 support 4, 8, 32, 64 KiB
sliding window A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the data link layer (OSI layer 2) as well as in the Trans ...
respectively, with support of maximum 256 bytes of matching length. Static Huffman encoding is used. lh5 is first introduced in LHarc 2, followed by lh6 in LHA 2.66 (MSDOS), lh7 in LHA 2.67 beta (MSDOS). LHA itself never compresses into lh4. ;lhd :Technically it is not a compression method, but it is used in .LZH archive to indicate that the compressed object is an empty directory.


Joe Jared extensions

Joe Jared extended LZSS to use larger dictionaries. ;lh8, lh9, lha, lhb, lhc, lhe :Dictionary (sliding window) sizes are 64, 128, 256, 512, 1024, 2048 KiB respectively. Jared ported LZH to Atari. The fact that lh8 is the same as lh7 was an oversight. Files using larger numbered methods may as well not exist, as Jared only considers them planned features.


UNLHA32 extensions

UNLHA32.DLL uses its own method for testing purposes. ;lhx :It uses 128–256 KiB dictionary.


PMarc extensions

These compression methods are created by PMarc, a
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initi ...
archiver created by Miyo. The archive usually has a .PMA extension. ;pc1 :PopCom compressed executable archive. Details unknown. ;pm0 :No compression method is applied to the source data. ;pm1 :8 KB sliding window, static huffman. Seldom generated, decompressor is reverse-engineered. ;pm2 :lh5 variant, 4K sliding window. ;pms :Used to indicate PMarc self-extracting archive. Should be skipped to reveal the real format.


LArc extensions

LArc uses the same file format as .LZH, but was written by Kazuhiko Miki, Haruhiko Okumura and Ken Masuyama, with extension name ".LZS". The program seems to have come before LZH. It uses a binary search tree in the LZ matching. ;lzs :It supports 2 KiB
sliding window A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the data link layer (OSI layer 2) as well as in the Trans ...
, with support of maximum 17 bytes of matching length. ;lz2 :It is similar to lzs, except dictionary size and match length can be changed. ;lz3 :Unknown. ;lz4 :No compression method is applied to the source data. ;lz5 :It supports 4 KiB
sliding window A sliding window protocol is a feature of packet-based data transmission protocols. Sliding window protocols are used where reliable in-order delivery of packets is required, such as in the data link layer (OSI layer 2) as well as in the Trans ...
, with support of maximum 17 bytes of matching length. ;lz7 ;lz8 :Unknown. Common implementations appear to only support lzs, lz5, plus the storage-only lz4.


Issues


LHICE/ICE

There are copies of LHICE marked as version 1.14. According to Okumura, LHICE is not written by Yoshizaki.


Y2K11 bug

Because of a bug, DOS time stamps from Level 0 and 1 headers after the year 2011 will be set to 1980, meaning that some utilities need to be patched. This is caused by a bug that interprets the unsigned 7-bit year number bitfield as a 5-bit number. The maximum year should be 2107 instead. The newer Level 2 and 3 headers use a 32-bit
Unix time Current Unix time () Unix time is a date and time representation widely used in computing. It measures time by the number of seconds that have elapsed since 00:00:00 UTC on 1 January 1970, the beginning of the Unix epoch, less adjustments m ...
instead. It suffers from the
Year 2038 problem The year 2038 problem (also known as Y2038, Y2K38, or the Epochalypse) is a time formatting bug in computer systems with representing times after 03:14:07 UTC on 19 January 2038. The problem exists in systems which measure Unix time â ...
.


Header size

According to Micco, the author of a popular LHA library UNLHA32.DLL, many LHA implementations do not check for the length of LHA file headers when reading the archive. Two problems could emerge from this scenario: a buffer-overrun may occur for naive implementations assuming a 4KB max size from the original specification; antivirus software may skip over files with such large headers and fail to scan for a virus. A similar problem exists with
ARJ ARJ (Archived by Robert Jung) is a software tool designed by Robert K. Jung for creating high-efficiency compressed file archives. ARJ is currently on version 2.86 for MS-DOS and 3.20 for Microsoft Windows and supports 16-bit, 32-bit and 64-bit ...
. Micco reported this problem to Japanese authorities, but they do not consider it a valid vulnerability. Micco went so far to conclude the development of UNLHA32 and advise people to give up on the format. Nevertheless, they came back in 2017 to fix a
DLL hijacking Dynamic-link library (DLL) is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX (for libraries containing ActiveX controls ...
issue.


See also

*
List of archive formats This is a list of file formats used by archivers and compressors used to create archive files. Archiving only Compression only Archiving and compression Data recovery Comparison Containers and compression Notes While the original ...
* LZX


References


External links


A history of data compression in Japan
Document about LHA.

– LHA library for Java
jLHA front-end

NSRL Magic File
contains PMarc info
Explzh
Current Windows 7 archiver for LZH/LHA. (Besides LZH it supports RAR, Zip, 7Z, ACE, Tar, Cab & others)
lhasa
a cross-platform,
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized so ...
LHA decompressor (+UNLHA32, PMArc, LArc extensions)
lzh format
document describing LZH header format. {{DEFAULTSORT:Lha (File Format) Archive formats Data compression software File archivers Amiga 1988 software Japanese inventions