JBIG2 is an image compression standard for bi-level images, developed by the Joint Bi-level Image Experts Group. It is suitable for both

lossless Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...

and

lossy In information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size ...

compression. According to a press release from the Group, in its lossless mode JBIG2 typically generates files 3–5 times smaller than Fax Group 4 and 2–4 times smaller than JBIG, the previous bi-level compression standard released by the Group. JBIG2 was published in 2000 as the international standard ITU T.88, and in 2001 as

ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Iso ...

/ IEC 14492.

Functionality

Ideally, a JBIG2 encoder will segment the input page into regions of text, regions of halftone images, and regions of other data. Regions that are neither text nor halftones are typically compressed using a context-dependent

arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...

algorithm called the MQ coder. Textual regions are compressed as follows: the foreground pixels in the regions are grouped into symbols. A dictionary of symbols is then created and encoded, typically also using context-dependent arithmetic coding, and the regions are encoded by describing which symbols appear where. Typically, a symbol will correspond to a character of text, but this is not required by the compression method. For lossy compression the difference between similar symbols (e.g., slightly different impressions of the same letter) can be neglected; for lossless compression, this difference is taken into account by compressing one similar symbol using another as a template. Halftone images may be compressed by reconstructing the grayscale image used to generate the halftone and then sending this image together with a dictionary of halftone patterns. Overall, the algorithm used by JBIG2 to compress text is very similar to the JB2 compression scheme used in the DjVu file format for coding binary images. PDF files versions 1.4 and above may contain JBIG2-compressed data. Open-source decoders for JBIG2 are jbig2dec ( AGPL), the java-based jbig2-imageio ( Apache-2), and the decoder by Glyph & Cog LLC found in Xpdf and Poppler(both

GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...

). An open-source encoder is jbig2enc ( Apache-2).

Technical details

Typically, a bi-level image consists mainly of a large amount of textual and halftone data, in which the same shapes appear repeatedly. The bi-level image is segmented into three regions: text, halftone, and generic regions. Each region is coded differently and the coding methodologies are described in the following passage.

Text image data

Text coding is based on the nature of human visual interpretation. A human observer cannot tell the difference between two instances of the same characters in a bi-level image even though they may not exactly match pixel by pixel. Therefore, only the bitmap of one representative character instance needs to be coded instead of coding the bitmaps of each occurrence of the same character individually. For each character instance, the coded instance of the character is then stored into a "symbol dictionary". There are two encoding methods for text image data: pattern matching and substitution (PM&S) and soft pattern matching (SPM). These methods are presented in the following subsections. ;Pattern matching and substitution: After performing

image segmentation In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects ( sets of pixels). The goal of segmentation is to simpli ...

and match searching, and if a match exists, we code an index of the corresponding representative bitmap in the dictionary and the position of the character on the page. The position is usually relative to another previously coded character. If a match is not found, the segmented pixel block is coded directly and added into the dictionary. Typical procedures of pattern matching and substitution algorithm are displayed in the left block diagram of the figure above. Although the method of PM&S can achieve outstanding compression, substitution errors could be made during the process if the image resolution is low. ;Soft pattern matching: In addition to a pointer to the dictionary and position information of the character, refinement data is also required because it is a crucial piece of information used to reconstruct the original character in the image. The deployment of refinement data can make the character-substitution error mentioned earlier highly unlikely. The refinement data contains the current desired character instance, which is coded using the pixels of both the current character and the matching character in the dictionary. Since it is known that the current character instance is highly correlated with the matched character, the prediction of the current pixel is more accurate.

Halftones

Halftone images can be compressed using two methods. One of the methods is similar to the context-based

algorithm, which adaptively positions the template pixels in order to obtain correlations between the adjacent pixels. In the second method, descreening is performed on the halftone image so that the image is converted back to grayscale. The converted grayscale values are then used as indexes of fixed-sized tiny bitmap patterns contained in a halftone bitmap dictionary. This allows decoder to successfully render a halftone image by presenting indexed dictionary bitmap patterns neighboring with each other.

Arithmetic entropy coding

All three region types including text, halftone, and generic regions may all use arithmetic coding. JBIG2 specifically uses the

MQ coder MQ may refer to: Places * Martinique (ISO 3166-1 alpha-2 country code MQ) * Vehicle registration code in Merseburg-Querfurt, Germany * Midway Islands (FIPS PUB 10-4 territory code) * Museumsquartier, a cultural area of Vienna, Austria Tech ...

, the same entropy encoder employed by JPEG 2000.

Patents

Patents for JBIG2 are owned by IBM and Mitsubishi. Free licenses should be available after a request. JBIG and JBIG2 patents are not the same.

Disadvantages

When used in lossy mode, JBIG2 compression can potentially alter text in a way that's not discernible as corruption. This is in contrast to some other algorithms, which simply degrade into a blur, making the compression artifacts obvious. Since JBIG2 tries to match up similar-looking symbols, the numbers "6" and "8" may get replaced, for example. In 2013, various substitutions (including replacing "6" with "8") were reported to happen on many

Xerox Xerox Holdings Corporation (; also known simply as Xerox) is an American corporation that sells print and electronic document, digital document products and services in more than 160 countries. Xerox is headquartered in Norwalk, Connecticut (ha ...

Workcentre

photocopier A photocopier (also called copier or copy machine, and formerly Xerox machine, the generic trademark) is a machine that makes copies of documents and other visual images onto paper or plastic film quickly and cheaply. Most modern photocopiers ...

and printer machines. Numbers printed on scanned (but not OCR-ed) documents had potentially been altered. This has been demonstrated on construction blueprints and some tables of numbers; the potential impact of such substitution errors in documents such as

medical prescription A prescription, often abbreviated or Rx, is a formal communication from a physician or other registered health-care professional to a pharmacist, authorizing them to dispense a specific prescription drug for a specific patient. Historicall ...

s was briefly mentioned. German computer scientist David Kriesel and Xerox were investigating this. Xerox subsequently acknowledged that this was a long-standing software defect, and their initial statements in suggesting that only non-factory settings could introduce the substitution were incorrect. Patches that comprehensively address the problem were published later in August, but no attempt has been made to recall or mandate updates to the affected devices – which was acknowledged to affect more than a dozen product families. Documents previously scanned continue to potentially contain errors making their veracity difficult to substantiate. Following publicity about the potential for errors authorities in some countries made statements to prevent the use of JBIG2. In Germany the Federal Office for Information Security has issued a technical guideline that says the JBIG2 encoding "MUST NOT be used" for "replacement scanning". In Switzerland the Coordination Office for the Permanent Archiving of Electronic Documents (Koordinationsstelle für die dauerhafte Archivierung elektronischer Unterlagen) has recommended against the use of JBIG2 when creating PDF documents.

Exploit

A vulnerability in the Xpdf implementation of JBIG2, re-used in Apple's

iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also include ...

phone operating software, was used by the Pegasus spyware to implement a zero-click attack on iPhones by constructing an emulated

computer architecture In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the ...

inside a JBIG2 stream. Apple fixed this " FORCEDENTRY" vulnerability in iOS 14.8 in September 2021.

References

External links

T.88: Lossy/lossless coding of bi-level images
{{DEFAULTSORT:Jbig2 Lossless compression algorithms Lossy compression algorithms Graphics file formats Image compression 2