TeX Font Metric
   HOME

TheInfoList



OR:

TeX font metric (TFM) is a
font In metal typesetting, a font is a particular size, weight and style of a typeface. Each font is a matched set of type, with a piece (a "sort") for each glyph. A typeface consists of a range of such fonts that shared an overall design. In mod ...
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
used by the
TeX Tex may refer to: People and fictional characters * Tex (nickname), a list of people and fictional characters with the nickname * Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr. Entertainment * ''Tex'', the Italian ...
typesetting Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or ''glyphs'' in digital systems representing ''characters'' (letters and other symbols).Dictionary.com Unabridged. Random Ho ...
system. It is a font metric format, not an outline font format like
TrueType TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating sy ...
, because it provides only the information necessary to typeset the font such as each character's width, height and depth. The actual
glyphs A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...
are stored elsewhere. This is not unique to TeX; Adobe's AFM files and Windows'
PFM PFM may refer to: * Federal Ministerial Police (''Policía Federal Ministerial''), a Mexican federal agency tasked with fighting corruption and organized crime * Pacific Fast Mail, a manufacturer of brass model trains * Pen For Men, a 1959 model of ...
(NTF on modern Windows
PostScript PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Doug Br ...
driver) files use the same technique. TFM files contain all of the information TeX needs to produce its device-independent (
DVI Digital Visual Interface (DVI) is a video display interface developed by the Digital Display Working Group (DDWG). The digital interface is used to connect a video source, such as a video display controller, to a display device, such as a comp ...
) output. The actual glyphs are then inserted by the eventual DVI output driver or previewer, using, for instance,
TrueType TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating sy ...
fonts, or fonts in the bitmap PK format derived from a METAFONT source. The format is designed to be extremely compact: in the original
Computer Modern Computer Modern is the original family of typefaces used by the typesetting program TeX. It was created by Donald Knuth with his Metafont program, and was most recently updated in 1992. Computer Modern, or variants of it, remains very widely us ...
distribution, every font's TFM file is smaller than 2 kB.


Specification

The canonical specification of the TFM format is embedded in the
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
of the program ''TFtoPL''. A TFM file is broken down into a series of four-byte
words A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
, which can contain data fields of various lengths. Any data fields that are more than one byte long are held in
big endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...
order. (The exact same file will be generated, regardless of architecture of the computer generating it.) The six-word (24-byte) file header contains twelve unsigned 16-bit integers which describe the length of the file, the range of character codes contained in the font, and the size of each of the tables. A single TFM file describes between 0 and 256 characters, inclusive. The body of the TFM file consists of a series of ten tables, each one except for the first laid out as an array of fixed-length fields. A 32-bit signed
fixed-point number In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representi ...
with 12 bits to the left of the decimal point, referred to as a fix_word, is used heavily. The first table, header, contains a checksum designed to prevent a document compiled into a
DVI Digital Visual Interface (DVI) is a video display interface developed by the Digital Display Working Group (DDWG). The digital interface is used to connect a video source, such as a video display controller, to a display device, such as a comp ...
with one set of fonts from being printed with a different set, as well as ASCII descriptions of the character coding scheme (e.g., ASCII or TeX text) and the font family. It also contains the font's design size; all following fix_word values are interpreted as multiplication factors for this. The next table, char_info, consists of one word per character, and contains indexes into the width, height, depth and italic correction tables. This is a device to save space, because width values, for instance, are frequently duplicated. Because height and depth values are duplicated more frequently, to fit all of these values into a single word, the indexes are limited to four bits. Because of this, there is a limit of sixteen different character heights and sixteen different character depths in any given TFM file. Also, there is a limit of sixty-four different italic corrections. There is also one more index which can point into the lig_kern table, or to information about extensible characters, depending on a two-bit tag value. Extensible characters use a series of repeated characters to construct a single large one of arbitrary size, usually large delimiters such as parentheses or brackets. There then follow the four tables width, height, depth and italic, which contain values (in fix_word format) referred to by indexes in char_info. Ligatures and
kerning In typography, kerning is the process of adjusting the spacing between Character (symbol), characters in a Typeface#Proportion, proportional font, usually to achieve a visually pleasing result. Kerning adjusts the space between individual le ...
are represented using a simple programming language consisting of fixed-length four-byte operations in the lig_kern table; it makes use of kerning values (specified as fix_words) in the kern table, which follows it. Extensible characters are specified in the exten table, using a series of four-byte words specifying the top, middle, bottom and repeated sections of an extensible character. For instance, the character at left below would be obtained by setting (top, mid, bot, rep) to the character codes for (/, <, \, , ). The first three character codes can be set to zero. For instance, if mid were set to 0 in the previous example, the result would change from the brace drawn at left to the parenthesis drawn to its right. / / , , , , < , , , , , \ \ Of course, the font would use specially designed characters for this, instead of reusing existing ones, but the principle is the same. The final table, param, contains a series of specifically defined fix_word values, including the font's
x-height upright 2.0, alt=A diagram showing the line terms used in typography In typography, the x-height, or corpus size, is the distance between the baseline and the mean line of lowercase letters in a typeface. Typically, this is the height of the let ...
and the amount of italic slant (to determine how far to shift accents). Certain coding schemes such as TeX math symbols and TeX math extension define extra parameters which appear after these.


Property lists

There is a
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In most c ...
equivalent to the TFM format called PL, for property list. There is an exact correspondence between a TFM file and a PL file: one can be freely converted to the other and back again with no loss of information using the tftopl and pltotf programs. The PL format, optimized for usability instead of space, does not make the same use of references that the TFM format does. For instance, many characters in a font may use the same character width, which would be represented only once in the TFM format, and this value would be referenced by each character, since the index would be significantly smaller than the full-precision numerical value. In the PL format, however, the full value is written out each time it appears. For example, this is the code for the upper-case letter Y in
Computer Modern Computer Modern is the original family of typefaces used by the typesetting program TeX. It was created by Donald Knuth with his Metafont program, and was most recently updated in 1992. Computer Modern, or variants of it, remains very widely us ...
Roman Roman or Romans most often refers to: *Rome, the capital city of Italy *Ancient Rome, Roman civilization from 8th century BC to 5th century AD *Roman people, the people of ancient Rome *''Epistle to the Romans'', shortened to ''Romans'', a letter ...
, ten point: The kerning values seen here are copied from another section of the PL file in order to make it easier to read, which in itself is redundant. Notice how the full numeric values of the kerning constants are written out each time they appear, instead of being stored once and referred to by a much smaller index.


Notes


References

*


External links


Description of the TFM File Format
{{TeX navbox Digital typography Font formats Typesetting