Bogofilter
   HOME

TheInfoList



OR:

Bogofilter is a
mail filter Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly appl ...
that classifies
e-mail Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" meant ...
as
spam Spam may refer to: * Spam (food), a canned pork meat product * Spamming, unsolicited or undesired electronic messages ** Email spam, unsolicited, undesired, or illegal email messages ** Messaging spam, spam targeting users of instant messaging ( ...
or ham (non-spam) by a
statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. It was originally written by
Eric S. Raymond Eric Steven Raymond (born December 4, 1957), often referred to as ESR, is an American software developer, open-source software advocate, and author of the 1997 essay and 1999 book ''The Cathedral and the Bazaar''. He wrote a guidebook for the ...
after he read Paul Graham's article
A Plan for Spam
and is now maintained together with a group of contributors by David Relson, Matthias Andree and Greg Louis. The statistical technique used is known as Bayesian filtering. Bogofilter's primary algorithm uses the ''f(w)'' parameter and the Fisher inverse chi-square technique that he describes. Bogofilter may be run by a
MDA MDA, mda, or ''variation'', may refer to: Places * Moldova, a country in Europe with the ISO 3166-1 country code MDA Politics * Meghalaya Democratic Alliance (2018), ruling coalition government in the Indian State of Meghalaya led by National Pe ...
or
mail client An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email. A web application which provides message management, composition, and reception functio ...
to classify messages as they are delivered to recipient mailboxes, or be used by a MTA to classify messages as they are received from the sending SMTP server. Bogofilter examines tokens in the message body and header, and refers to wordlists stored by
BerkeleyDB Berkeley DB (BDB) is an unmaintained embedded database software library for key/value data, historically significant in open source software. Berkeley DB is written in C with API bindings for many other programming languages. BDB stores arbit ...
,
SQLite SQLite (, ) is a database engine written in the C programming language. It is not a standalone app; rather, it is a library that software developers embed in their apps. As such, it belongs to the family of embedded databases. It is the m ...
or QDBM to calculate a probability score that a new message is spam. Bogofilter provides processing for plain text and
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
and supports reading multi-part
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
message including base64,
quoted-printable Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters (alphanumeric and the equals sign =) to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. His ...
, and uuencoded text or HTML. Bogofilter ignores non-text attachments, such as images. It is possible to tune Bogofilter's statistical algorithms by modifying various
coefficient In mathematics, a coefficient is a multiplicative factor in some term of a polynomial, a series, or an expression; it is usually a number, but may be any expression (including variables such as , and ). When the coefficients are themselves var ...
s and other settings in its configuration file, or by using the automated ''bogotune'' utility included with the software, which attempts to optimise various coefficients to maximise filtering efficiency for a particular corpus of spam and non-spam. Standard tests a
TREC 2005
show that Bogofilter compares well to its competitors
spambayes SpamBayes is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay "A Plan for Spam". It has subsequently been improved by Gary Robinson and Tim Peters, among others. The most notable difference b ...
,
CRM114 The CRM 114 Discriminator is a fictional piece of radio equipment in Stanley Kubrick's film ''Dr. Strangelove'' (1964), the destruction of which prevents the crew of a B-52 from receiving the recall code that would stop them from dropping their ...
and DSPAM. Other competitors include, but are not limited to Spamprobe and QSF. Bogofilter is written in C, and runs on
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
,
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
,
NetBSD NetBSD is a free and open-source Unix operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was forked. It continues to be actively developed and is a ...
,
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project em ...
,
Solaris Solaris may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Solaris'' (1972 film), directed by ...
,
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
,
HP-UX HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrity Ser ...
,
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set * Athens Internet Exchange, a European Internet exchange point Places Belgi ...
and other platforms. It is released under the
GNU GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
.


Email clients that can use Bogofilter

The following email clients are known to support Bogofilter as a spam filtering backend: *
GNOME Evolution GNOME Evolution (formerly Novell Evolution and Ximian Evolution, prior to Novell's 2003 acquisition of Ximian) is the official personal information manager for GNOME. It has been an official part of GNOME since Evolution 2.0 was included with ...
*
Claws Mail Claws Mail is a free and open-source, C/GTK-based e-mail client, which is both lightweight and highly configurable. Claws Mail runs on both Windows and Unix-like systems such as Linux, BSD, and Solaris. It stores mail in the MH mailbox format. ...
*
KMail Kontact is a personal information manager and groupware software suite developed by KDE. It supports calendars, contacts, notes, to-do lists, news, and email. It offers a number of inter-changeable graphical UIs (KMail, KAddressBook, Akregator, ...
*
Mutt (email client) Mutt is a text-based email client for Unix-like systems. It was originally written by Michael Elkins in 1995 and released under the GNU General Public License version 2 or any later version. The Mutt slogan is "''All mail clients suck. This one ...
* Alpine (email client)


See also

*
Blacklist Blacklisting is the action of a group or authority compiling a blacklist (or black list) of people, countries or other entities to be avoided or distrusted as being deemed unacceptable to those making the list. If someone is on a blacklist, t ...
* Greylisting *
Whitelist A whitelist, allowlist, or passlist is a mechanism which explicitly allows some identified entities to access a particular privilege, service, mobility, or recognition i.e. it is a list of things allowed when everything is denied by default. It is ...
* Tarpit


References


External links


Official homepage
*{{Freshmeat, bogofilter, Bogofilter

– An essay by Paul Graham discussing the main ideas behind this program ''This article, or an earlier revision of it, was edited fro
bogofilter's homepage
'' Free email software Spam filtering