External metadata
A final way of storing the format of a file is to explicitly store information about the format in the file system, rather than within the file itself. This approach keeps the metadata separate from both the main data and the name, but is also lessMac OS type-codes
The Mac OS' Hierarchical File System stores codes for '' creator'' and '' type'' as part of the directory entry for each file. These codes are referred to as OSTypes. These codes could be any 4-byte sequence but were often selected so that the ASCII representation formed a sequence of meaningful characters, such as an abbreviation of the application's name or the developer's initials. For instance a HyperCard "stack" file has a ''creator'' of (from Hypercard's previous name, "WildCard") and a ''type'' of . The BBEdit text editor has a creator code of referring to its original programmer,Mac OS X uniform type identifiers (UTIs)
A Uniform Type Identifier (UTI) is a method used inOS/2 extended attributes
The HPFS, FAT12, and FAT16 (but not FAT32) filesystems allow the storage of "extended attributes" with files. These comprise an arbitrary set of triplets with a name, a coded type for the value, and a value, where the names are unique and values can be up to 64 KB long. There are standardized meanings for certain types and names (under OS/2). One such is that the ".TYPE" extended attribute is used to determine the file type. Its value comprises a list of one or more file types associated with the file, each of which is a string, such as "Plain Text" or "HTML document". Thus a file may have several types. The NTFS filesystem also allows storage of OS/2 extended attributes, as one of the file ''forks'', but this feature is merely present to support the OS/2 subsystem (not present in XP), so the Win32 subsystem treats this information as an opaque block of data and does not use it. Instead, it relies on other file forks to store meta-information in Win32-specific formats. OS/2 extended attributes can still be read and written by Win32 programs, but the data must be entirely parsed by applications.POSIX extended attributes
On Unix andPRONOM unique identifiers (PUIDs)
The PRONOM Persistent Unique Identifier (PUID) is an extensible scheme of persistent, unique, and unambiguous identifiers for file formats, which has been developed by The National Archives of the UK as part of its PRONOM technical registry service. PUIDs can be expressed as Uniform Resource Identifiers using the namespace. Although not yet widely used outside of the UK government and some digital preservation programs, the PUID scheme does provide greater granularity than most alternative schemes.MIME types
MIME types are widely used in manyFile format identifiers (FFIDs)
File format identifiers are another, not widely used way to identify file formats according to their origin and their file category. It was created for the Description Explorer suite of software. It is composed of several digits of the form . The first part indicates the organization origin/maintainer (this number represents a value in a company/standards organization database), and the 2 following digits categorize the type of file inFile content based format identification
Another but less popular way to identify the file format is to examine the file contents for distinguishable patterns among file types. The contents of a file are a sequence of bytes and a byte has 256 unique permutations (0–255). Thus, counting the occurrence of byte patterns that is often referred to as byte frequency distribution gives distinguishable patterns to identify file types. There are many content-based file type identification schemes that use a byte frequency distribution to build the representative models for file type and use any statistical and data mining techniques to identify file typesFile structure
There are several types of ways to structure data in a file. The most usual ones are described below.Unstructured formats (raw memory dumps)
Earlier file formats used raw data formats that consisted of directly dumping the memory images of one or more structures into the file. This has several drawbacks. Unless the memory images also have reserved spaces for future extensions, extending and improving this type of structured file is very difficult. It also creates files that might be specific to one platform or programming language (for example a structure containing aChunk-based formats
In this kind of file structure, each piece of data is embedded in a container that somehow identifies the data. The container's scope can be identified by start- and end-markers of some kind, by an explicit length field somewhere, or by fixed requirements of the file format's definition. Throughout the 1970s, many programs used formats of this general kind. For example, word-processors such as troff,Directory-based formats
This is another extensible format, that closely resembles a file system ( OLE Documents are actual filesystems), where the file is composed of 'directory entries' that contain the location of the data within the file itself as well as its signatures (and in certain cases its type). Good examples of these types of file structures areSee also
* Audio file format * Chemical file format *References
:* :* :*External links
* * ("The file formats you use have a direct impact on your ability to open those files at a later date and on the ability of other people to access those data") {{DEFAULTSORT:File format