Filename extensions
   HOME

TheInfoList



OR:

A filename extension, file name extension or file extension is a suffix to the
name A name is a term used for identification by an external observer. They can identify a class or category of things, or a single thing, either uniquely, or within a given context. The entity identified by a name is called its referent. A personal ...
of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with a
full stop The full stop (Commonwealth English), period (North American English), or full point , is a punctuation mark. It is used for several purposes, most often to mark the end of a declarative sentence (as distinguished from a question or exclamation ...
(period), but in some systems it is separated with spaces. Other extension formats include dashes and/or underscores on early versions of Linux and some versions of IBM AIX. Some
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
s implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others treat filename extensions as part of the filename without special distinction.


Usage

Filename extensions may be considered a type of
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
. They are commonly used to imply information about the way data might be stored in the file. The exact definition, giving the criteria for deciding what part of the file name is its extension, belongs to the rules of the specific filesystem used; usually the extension is the substring which follows the last occurrence, if any, of the dot character (''example:'' txt is the extension of the filename readme.txt, and html the extension of mysite.index.html). On file systems of some mainframe systems such as CMS in VM,
VMS #REDIRECT VMS {{redirect category shell, {{R from other capitalisation{{R from ambiguous page ...
, and of PC systems such as
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initial ...
and derivative systems such as MS-DOS, the extension is a separate namespace from the filename. Under Microsoft's DOS and Windows, extensions such as EXE, COM or BAT indicate that a file is a program
executable In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), instructi ...
. In OS/360 and successors, the part of the dataset name following the last period is treated as an extension by some software, e.g., TSO EDIT, but it has no special significance to the operating system itself; the same applies to Unix files in MVS. Filesystems for UNIX-like operating systems do not separate the extension metadata from the rest of the file name. The dot character is just another character in the main filename. A file name may have no extensions. Sometimes it is said to have more than one extension, although terminology varies in this regard, and most authors define ''extension'' in a way that doesn't allow more than one in the same file name. More than one extension usually represents nested transformations, such as files.tar.gz (the .tar indicates that the file is a
tar archive In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for Software distribution, distribution or backup purposes. The name is derived from "tape archive", as it was ...
of one or more files, and the .gz indicates that the tar archive file is compressed with gzip). Programs transforming or creating files may add the appropriate extension to names inferred from input file names (unless explicitly given an output file name), but programs reading files usually ignore the information; it is mostly intended for the human user. It is more common, especially in binary files, for the file itself to contain internal metadata describing its contents. This model generally requires the full filename to be provided in commands, whereas the metadata approach often allows the extension to be omitted. The VFAT, NTFS, and
ReFS Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS. ReFS was designed to overcome problems tha ...
file systems for Windows also do not separate the extension metadata from the rest of the file name, and allow multiple extensions. With the advent of graphical user interfaces, the issue of file management and interface behavior arose. Microsoft Windows allowed multiple applications to be associated with a given extension, and different actions were available for selecting the required application, such as a context menu offering a choice between viewing, editing or printing the file. The assumption was still that any extension represented a single file type; there was an unambiguous mapping between extension and icon. The
classic Mac OS Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Macintosh family of personal computers by Apple Computer from 1984 to 2001, starting with System 1 and ending with Mac OS 9. The ...
disposed of filename-based extension metadata entirely; it used, instead, a distinct file type code to identify the file format. Additionally, a creator code was specified to determine which application would be launched when the file's icon was double-clicked. macOS, however, uses filename suffixes, as well as type and creator codes, as a consequence of being derived from the UNIX-like NeXTSTEP operating system.


Improvements

The filename extension was originally used to determine the file's generic type. The need to condense a file's type into three characters frequently led to abbreviated extensions. Examples include using .GFX for graphics files, .TXT for plain text, and .MUS for music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS or .WS''n'', where ''n'' was the program's version number. Also, conflicting uses of some filename extensions developed. One example is .rpm, used for both
RPM Package Manager RPM Package Manager (RPM) (originally Red Hat Package Manager, now a recursive acronym) is a free and open-source package management system. The name RPM refers to the file format and the package manager program itself. RPM was intended primaril ...
packages and RealPlayer Media files;. Others are .qif, shared by DESQview fonts, Quicken financial ledgers, and
QuickTime QuickTime is an extensible multimedia framework developed by Apple Inc., capable of handling various formats of digital video, picture, sound, panoramic images, and interactivity. Created in 1991, the latest Mac version, QuickTime X, is avai ...
pictures; .gba, shared by
GrabIt GrabIt is a freeware Usenet newsreader for Windows developed by Ilan Shemes. History Ilan Shemes has been making GrabIt changes since the program has been introduced. SSL support was introduced in version 1.7.2 Beta. Features The GrabIt pro ...
scripts and Game Boy Advance ROM images; .sb, used for SmallBasic and Scratch; and .dts, being used for Dynamix Three Space and DTS. Some other operating systems that used filename extensions generally had fewer restrictions on filenames. Many allowed full filename lengths of 14 or more characters, and maximum name lengths up to 255 were not uncommon. The file systems in operating systems such as Multics and UNIX stored the file name as a single string, not split into base name and extension components, allowing the "." to be just another character allowed in file names. Such systems generally allow for variable-length filenames, permitting more than one dot, and hence multiple suffixes. Some components of Multics and UNIX, and applications running on them, used suffixes, in some cases, to indicate file types, but they did not use them as much—for example, executables and ordinary text files had no suffixes in their names. The High Performance File System (HPFS), used in Microsoft and IBM's OS/2 also supported long file names and did not divide the file name into a name and an extension. The convention of using suffixes continued, even though HPFS supported extended attributes for files, allowing a file's type to be stored in the file as an extended attribute. Microsoft's Windows NT's native file system, NTFS, supported long file names and did not divide the file name into a name and an extension, but again, the convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows. When the Internet age first arrived, those using Windows systems that were still restricted to
8.3 An 8.3 filename (also called a short filename or SFN) is a filename convention used by old versions of DOS and versions of Microsoft Windows prior to Windows 95 and Windows NT 3.5. It is also used in modern Microsoft operating systems as an alterna ...
filename formats had to create web pages with names ending in .HTM, while those using Macintosh or UNIX computers could use the recommended .html filename extension. This also became a problem for programmers experimenting with the Java programming language, since it ''requires'' the four-letter suffix .java for source code files and the five-letter suffix .class for Java compiler object code output files. Eventually, Windows 95 introduced support for long file names, and removed the 8.3 name/extension split in file names from non-NT Windows, in an extended version of the commonly used FAT
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
called VFAT. VFAT first appeared in
Windows NT 3.5 Windows NT 3.5 is a major release of the Windows NT operating system developed by Microsoft and oriented towards businesses. It was released on September 21, 1994, as the successor to Windows NT 3.1 and the predecessor to Windows NT 3.51. One ...
and Windows 95. The internal implementation of long file names in VFAT is largely considered to be a kludge, but it removed the important length restriction and allowed files to have a mix of upper case and
lower case Letter case is the distinction between the letters that are in larger uppercase or capitals (or more formally ''majuscule'') and smaller lowercase (or more formally ''minuscule'') in the written representation of certain languages. The writing ...
letters, on machines that would not run Windows NT well.


Command name issues

The use of a filename extension in a command name appears occasionally, usually as a side effect of the command having been implemented as a script, e.g., for the
Bourne shell The Bourne shell (sh) is a Shell (computing), shell Command-line interface#Command-line interpreter, command-line interpreter for computer operating systems. The Bourne shell was the default Unix shell, shell for Version 7 Unix. Unix-like syste ...
or for Python, and the interpreter name being suffixed to the command name, a practice common on systems that rely on associations between filename extension and interpreter, but sharply deprecated in Unix-like systems, such as Linux, Oracle Solaris,
BSD The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berk ...
-based systems, and Apple's macOS, where the interpreter is normally specified as a header in the script (" shebang"). On association-based systems, the filename extension is generally mapped to a single, system-wide selection of interpreter for that extension (such as ".py" meaning to use Python), and the command itself is runnable from the command line even if the extension is omitted (assuming appropriate setup is done). If the implementation language is changed, the command name extension is changed as well, and the OS provides a consistent API by allowing the same extensionless version of the command to be used in both cases. This method suffers somewhat from the essentially global nature of the association mapping, as well as from developers' incomplete avoidance of extensions when calling programs, and that developers can't force that avoidance. Windows is the only remaining widespread employer of this mechanism. On systems with
interpreter directive An interpreter directive is a computer language construct, that on some systems is better described as an aspect of the system's executable file format, that is used to control which interpreter parses and interprets the instructions in a compute ...
s, including virtually all versions of Unix, command name extensions have no special significance, and are by standard practice not used, since the primary method to set interpreters for scripts is to start them with a single line specifying the interpreter to use (which could be viewed as a degenerate resource fork). In these environments, including the extension in a command name unnecessarily exposes an implementation detail which puts all references to the commands from other programs at future risk if the implementation changes. For example, it would be perfectly normal for a shell script to be reimplemented in Python or Ruby, and later in C or C++, all of which would change the name of the command were extensions used. Without extensions, a program always has the same extension-less name, with only the
interpreter directive An interpreter directive is a computer language construct, that on some systems is better described as an aspect of the system's executable file format, that is used to control which interpreter parses and interprets the instructions in a compute ...
and/or magic number changing, and references to the program from other programs remain valid.


Security issues

The default behavior of File Explorer, the file browser provided with
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
, is for filename extensions to not be displayed. Malicious users have tried to spread
computer virus A computer virus is a type of computer program that, when executed, replicates itself by modifying other computer programs and inserting its own code. If this replication succeeds, the affected areas are then said to be "infected" with a compu ...
es and
computer worm A computer worm is a standalone malware computer program that replicates itself in order to spread to other computers. It often uses a computer network to spread itself, relying on security failures on the target computer to access it. It wil ...
s by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs. The hope is that this will appear as LOVE-LETTER-FOR-YOU.TXT, a harmless text file, without alerting the user to the fact that it is a harmful computer program, in this case, written in VBScript. Default behavior for
ReactOS ReactOS is a free and open-source operating system for amd64/i686 personal computers intended to be binary-compatible with computer programs and device drivers made for Windows Server 2003 and later versions of Windows. ReactOS has been noted a ...
is to display filename extensions in ReactOS Explorer. Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) included customizable lists of filename extensions that should be considered "dangerous" in certain "zones" of operation, such as when downloaded from the
web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
or received as an e-mail attachment. Modern antivirus software systems also help to defend users against such attempted attacks where possible. Some viruses take advantage of the similarity between the "
.com The domain name .com is a top-level domain (TLD) in the Domain Name System (DNS) of the Internet. Added at the beginning of 1985, its name is derived from the word ''commercial'', indicating its original intended purpose for domains registere ...
"
top-level domain A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet after the root domain. The top-level domain names are installed in the root zone of the name space. For all domains in ...
and the ".COM" filename extension by emailing malicious, executable command-file attachments under names superficially similar to URLs (''e.g.'', "myparty.yahoo.com"), with the effect that unaware users click on email-embedded links that they think lead to websites but actually download and execute the malicious attachments. There have been instances of
malware Malware (a portmanteau for ''malicious software'') is any software intentionally designed to cause disruption to a computer, server, client, or computer network, leak private information, gain unauthorized access to information or systems, depri ...
crafted to exploit vulnerabilities in some Windows applications which could cause a stack-based buffer overflow when opening a file with an overly long, unhandled filename extension. The filename extension is just a marker and the content of the file does not have to match it. This can be used to disguise malicious content. When trying to identify a file for security reasons, it is therefore considered dangerous to rely on the extension alone and a proper analysis of the content of the file is preferred. For example, on UNIX derived systems, it is not uncommon to find files with no extensions at all, as commands such as file (command) are meant to be used instead, and will read the file's header to determine its content.


Alternatives

In many Internet protocols, such as HTTP and MIME email, the type of a bitstream is stated as the
media type A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication o ...
, or MIME type, of the stream, rather than a filename extension. This is given in a line of text preceding the stream, such as ''Content-type: text/plain''. There is no standard mapping between filename extensions and media types, resulting in possible mismatches in interpretation between authors, web servers, and client software when transferring files over the Internet. For instance, a content author may specify the extension ''svgz'' for a compressed Scalable Vector Graphics file, but a web server that does not recognize this extension may not send the proper content type ''application/svg+xml'' and its required compression header, leaving web browsers unable to correctly interpret and display the image.
BeOS BeOS is an operating system for personal computers first developed by Be Inc. in 1990. It was first written to run on BeBox hardware. BeOS was positioned as a multimedia platform that could be used by a substantial population of desktop users a ...
, whose BFS file system supports extended attributes, would tag a file with its media type as an extended attribute. The
KDE KDE is an international Free software movement, free software community that develops free and open-source software. As a central development hub, it provides tools and resources that allow collaborative work on this kind of software. Well-know ...
and
GNOME A gnome is a mythological creature and diminutive spirit in Renaissance magic and alchemy, first introduced by Paracelsus in the 16th century and later adopted by more recent authors including those of modern fantasy literature. Its characte ...
desktop environments associate a media type with a file by examining both the filename suffix and the contents of the file, in the fashion of the file command, as a heuristic. They choose the application to launch when a file is opened based on that media type, reducing the dependency on filename extensions. macOS uses both filename extensions and media types, as well as file type codes, to select a Uniform Type Identifier by which to identify the file type internally.


See also

* file (command) * List of file formats *
List of filename extensions This alphabetical list of filename extensions contains extensions of notable file formats used by multiple notable applications or services. 0–9 A–E F–L M–R S–Z See also * List of file formats This is a list of file form ...
*
Metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
*
.properties .properties is a file extension for files mainly used in Java-related technologies to store the configurable parameters of an application. They can also be used for storing strings for Internationalization and localization; these are known as P ...


References


External links

* *
Database of filename extensions
at FileInfo.com Metadata Computer files Names