Mbox Wikigraphist
   HOME

TheInfoList



OR:

Mbox is a generic term for a family of related file formats used for holding collections of
email Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" mean ...
messages. It was first implemented in Fifth Edition Unix. All messages in an mbox mailbox are concatenated and stored as plain text in a single file. Each message starts with the four characters "From" followed by a space (the so-called "From_ line") and the sender's email address. RFC 4155 defines that a UTC timestamp follows after another separating space character. A format similar to mbox is the
MH Message Handling System The MH Message Handling System is a free, open source e-mail client. It is different from almost all other mail reading systems in that, instead of a single program, it is made from several different programs which are designed to work from the co ...
. Other systems, such as
Microsoft Exchange Server Microsoft Exchange Server is a mail server and calendaring server developed by Microsoft. It runs exclusively on Windows Server operating systems. The first version was called Exchange Server 4.0, to position it as the successor to the related ...
and the Cyrus IMAP server, store mailboxes in centralized databases managed by the mail system and not directly accessible by individual users. The
maildir The Maildir e-mail format is a common way of storing email messages in which each message is stored in a separate file with a unique name, and each mail folder is a file system directory. The local file system handles file locking as messages are ...
mailbox format is often cited as an alternative to the mbox format for networked email storage systems.


Mail storage protocols

Unlike the Internet protocols used for the exchange of email, the format used for the storage of email has never been formally defined through the RFC standardization mechanism and has been entirely left to the developer of an email client. However, the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inter ...
standard defined a loose framework in conjunction with the mailx program. In 2005, the application/mbox media type was standardized as RFC 4155, which hinted that mbox stores mailbox messages in their original Internet Message (RFC 2822) format, except for the used newline character, seven-bit clean data storage, and the requirement that each newly added message is terminated with a completely empty line within the mbox database.


Mbox family

The mbox format uses a single blank line followed by the string 'From ' (with a space) to delimit messages; this can create ambiguities if a message contains the same sequence in the message text. Over the years, four popular but incompatible variants arose: ''mboxo'', ''mboxrd'', ''mboxcl'', and ''mboxcl2''. The naming scheme was developed by Daniel J. Bernstein, Rahul Dhesi, and others in 1996. Each originated from a different version of
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
. ''mboxcl'' and ''mboxcl2'' originated from the file format used by Unix System V Release 4 mail tools. ''mboxrd'' was invented by Rahul Dhesi et al. as a rationalization of ''mboxo'' and subsequently adopted by some Unix mail tools including
qmail qmail is a mail transfer agent (MTA) that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure replacement for the popular Sendmail program. Originally license-free software, qmail's source code ...
. All these variants have the problem that the content of the message sometimes must be modified to remove ambiguities, as shown below, so that applications have to know which quoting rule has been used to perform the correct reversion, which turned out to be impractical. Using
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
and choosing a content-transfer-encoding that quotes "From_" lines in a standard-compliant fashion ensures that message content doesn't need to be changed, but only their
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
representation. Therefore, checksums remain constant, a necessary precondition for supporting
S/MIME S/MIME (Secure/Multipurpose Internet Mail Extensions) is a standard for public key encryption and signing of MIME data. S/MIME is on an IETF standards track and defined in a number of documents, most importantly . It was originally developed by R ...
and
Pretty Good Privacy Pretty Good Privacy (PGP) is an encryption program that provides cryptographic privacy and authentication for data communication. PGP is used for signing, encrypting, and decrypting texts, e-mails, files, directories, and whole disk partition ...
. Applications that newly create messages and store them in mbox database files will likely use this approach to detach message content from database storage format. ''mboxo'' and ''mboxrd'' locate the message start by scanning for ''From '' lines that are found before the email message headers. If a "From " string occurs at the beginning of a line in either the header or the body of a message (a mail standard violation for the former, but not for the latter), the email message must be modified before the message is stored in an mbox mailbox file or the line will be taken as a message boundary. To avoid misinterpreting a "From " string at the beginning of the line in the email body as the beginning of a new email, some systems "From-munge" the message, typically by prepending a greater-than sign: >From my point of view... In the ''mboxo'' format, such lines have irreversible ambiguity. In the ''mboxo'' format, this can lead to corruption of the message. If a line already contained >From  at the beginning (such as in a quotation), it is unchanged when written. When subsequently read by the mail software, the leading > is erroneously removed. The ''mboxrd'' format solves this by converting From  to >From  and converting >From  to >>From , etc. The transformation is then always reversible. Example: From MAILER-DAEMON Fri Jul 8 12:08:34 2011 From: Author To: Recipient Subject: Sample message 1 This is the body. >From (should be escaped). There are 3 lines. From MAILER-DAEMON Fri Jul 8 12:08:34 2011 From: Author To: Recipient Subject: Sample message 2 This is the second body. The ''mboxcl'' and ''mboxcl2'' formats use a Content-Length: header to determine the messages’ lengths and thereby the next ''real From line''. ''mboxcl'' still quotes ''From  lines'' in the messages themselves as ''mboxrd'' does, while ''mboxcl2'' doesn't.


Modified mbox

Some
email client An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email. A web application which provides message management, composition, and reception functio ...
s use a modification of the mbox format for their mail folders. * Eudora used an ''mboxo'' variation where a sender's email address is replaced by the constant string "???@???". Most mbox clients store incoming messages as received. Eudora separates out attachments embedded in the message, storing the attachments as separate individual files in one folder. * The
Mozilla Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, w ...
family of email clients (Mozilla, Netscape, Thunderbird, et al.) use an ''mboxrd'' variation with more complex ''From line'' quoting rules.


File locking

Because more than one message is stored in a single file, some form of
file locking File locking is a mechanism that restricts access to a computer file, or to a region of a file, by allowing only one user or process to modify or delete it at a specific time and to prevent reading of the file while it's being modified or deleted ...
is needed to avoid the corruption that can result from two or more processes modifying the mailbox simultaneously. This could happen if a network email delivery program delivers a new message at the same time as a mail reader is deleting an existing message. Various mutually incompatible mechanisms have been used by different mbox formats to enable message file locking, including fcntl() and lockf(). This does not work well with network mounted file systems, such as the Network File System (NFS), which is why traditionally Unix used additional "dot lock" files, which could be created atomically even over NFS. Mbox files should also be locked while they are being read. Otherwise, the reader may see corrupted message contents if another process is modifying the mbox at the same time, even though no actual file corruption occurs.


As a patch format

In open source development, it is common to send patches in the
diff In computing, the utility diff is a data comparison tool that computes and displays the differences between the contents of files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but ...
format to a mailing list for discussion. The diff format allows for irrelevant "headers", such as mbox data, to be added.
Version control system In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
s like
git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...
have support for generating mbox-formatted patches and for sending them to the list as emails in a thread.


See also

*
Maildir The Maildir e-mail format is a common way of storing email messages in which each message is stored in a separate file with a unique name, and each mail folder is a file system directory. The local file system handles file locking as messages are ...
*
MIX (email) MIX is a high-performance, indexed, on-disk email storage system that is designed for use with the IMAP protocol. MIX was designed by Mark Crispin, the author of the IMAP protocol. Server support for it has been included in releases of UW IMAP s ...
*
MH Message Handling System The MH Message Handling System is a free, open source e-mail client. It is different from almost all other mail reading systems in that, instead of a single program, it is made from several different programs which are designed to work from the co ...


References

{{reflist


Further reading


qmail mbox manual page

Internet Mail Consortium
– Standards body Email storage formats