Apache SpamAssassin is a

computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...

used for e-mail spam filtering. It uses a variety of spam-detection techniques, including

DNS The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned to ...

and

fuzzy checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify dat ...

techniques, Bayesian filtering, external programs, blacklists and online databases. It is released under the Apache License 2.0 and is a part of the

Apache Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the ...

since 2004. The program can be integrated with the

mail server Within the Internet email system, a message transfer agent (MTA), or mail transfer agent, or mail relay is software that transfers electronic mail messages from one computer to another using SMTP. The terms mail server, mail exchanger, and MX host ...

to automatically filter all mail for a site. It can also be run by individual users on their own mailbox and integrates with several mail programs. Apache SpamAssassin is highly configurable; if used as a system-wide filter it can still be configured to support per-user preferences.

History

Apache SpamAssassin was created by Justin Mason, who had maintained a number of patches against an earlier program named ''filter.plx'' by Mark Jeftovic, which in turn was begun in August 1997. Mason rewrote all of Jeftovic's code from scratch and uploaded the resulting codebase to

SourceForge SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirrorin ...

on April 20, 2001. In Summer 2004 the project became an

Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the A ...

project and later officially renamed to ''Apache SpamAssassin''. The SpamAssassin 3.4.2 release in September 2019 was the first in over three years, but the developers say that "The project has picked up a new set of developers and is moving forward again.". In December 2019, version 3.4.3 of SpamAssassin was released. In April, 2021, version 3.4.6 of SpamAssassin was released. It was announced that development of version 4.0.0 would become project's focus.

Methods of usage

Apache SpamAssassin is a

Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...

-based application ( in

CPAN The Comprehensive Perl Archive Network (CPAN) is a repository of over 250,000 software modules and accompanying documentation for 39,000 distributions, written in the Perl programming language by over 12,000 contributors. ''CPAN'' can denote eith ...

) which is usually used to filter all incoming mail for one or several users. It can be run as a standalone application or as a subprogram of another application (such as a

Milter Milter (portmanteau for ''mail filter'') is an extension to the widely used open source mail transfer agents (MTA) Sendmail and Postfix. It allows administrators to add mail filters for filtering spam or viruses in the mail-processing chain. In ...

, SA-Exim, Exiscan,

MailScanner MailScanner is an open-source software, open source email security system for use on Unix email gateways and was first released in 2001. It protects against Computer virus, viruses, email spam, spam, malware, and phishing. It is distributed unde ...

MIMEDefang MIMEDefang is a GNU General Public License, GPL software license, licensed framework for e-mail filtering, filtering e-mail. It uses sendmail's "Milter" API, some C (programming language), C glue code, and some Perl code to let the user write hig ...

Amavis Amavis is an open-source content filter for electronic mail, implementing mail message transfer, decoding, some processing and checking, and interfacing with external content filters to provide protection against spam and viruses and other ma ...

) or as a

client Client(s) or The Client may refer to: * Client (business) * Client (computing), hardware or software that accesses a remote service on another computer * Customer or client, a recipient of goods or services in return for monetary or other valuable ...

() that communicates with a

daemon Daimon or Daemon (Ancient Greek: , "god", "godlike", "power", "fate") originally referred to a lesser deity or guiding spirit such as the daimons of ancient Greek religion and mythology and of later Hellenistic religion and philosophy. The word ...

(). The client/server or embedded mode of operation has performance benefits, but under certain circumstances may introduce additional security risks. Typically either variant of the application is set up in a generic

mail filter Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly appl ...

program, or it is called directly from a

mail user agent The mail or post is a system for physically transporting postcards, letters, and parcels. A postal service can be private or public, though many governments place restrictions on private systems. Since the mid-19th century, national postal syst ...

that supports this, whenever new mail arrives. Mail filter programs such as

procmail procmail is an email server software component — specifically, a message delivery agent (MDA). It was one of the earliest mail filter programs. It is typically used in Unix-like mail systems, using the mbox and Maildir storage formats. procm ...

can be made to

pipe Pipe(s), PIPE(S) or piping may refer to: Objects * Pipe (fluid conveyance), a hollow cylinder following certain dimension rules ** Piping, the use of pipes in industry * Smoking pipe ** Tobacco pipe * Half-pipe and quarter pipe, semi-circula ...

all incoming mail through Apache SpamAssassin with an adjustment to a user's file.

Operation

Apache SpamAssassin comes with a large set of rules which are applied to determine whether an email is spam or not. Most rules are based on

regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...

s that are matched against the body or header fields of the message, but Apache SpamAssassin also employs a number of other spam-fighting techniques. The rules are called "tests" in the SpamAssassin documentation. Each test has a score value that will be assigned to a message if it matches the test's criteria. The scores can be positive or negative, with positive values indicating "spam" and negative "ham" (non-spam messages). A message is matched against all tests and Apache SpamAssassin combines the results into a global score which is assigned to the message. The higher the score, the higher the probability that the message is spam. Apache SpamAssassin has an internal (configurable) score threshold to classify a message as spam. Usually a message will only be considered as spam if it matches multiple criteria; matching just a single test will not usually be enough to reach the threshold. If Apache SpamAssassin considers a message to be spam, it can be further rewritten. In the default configuration, the content of the mail is appended as a

MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...

attachment, with a brief excerpt in the message body, and a description of the tests which resulted in the mail being classified as spam. If the score is lower than the defined settings, by default the information about the tests passed and total score is still added to the email headers and can be used in post-processing for less severe actions, such as tagging the mail as suspicious. Apache SpamAssassin allows for a per-user configuration of its behavior, even if installed as system-wide service; the configuration can be read from a file or a database. In their configuration users can specify individuals whose emails are never considered spam, or change the scores for certain rules. The user can also define a list of languages which they want to receive mail in, and Apache SpamAssassin then assigns a higher score to all mails that appear to be written in another language. Apache SpamAssassin is based on heuristics (pattern recognition), and such software exhibits false positives and false negatives.

Network-based filtering methods

Apache SpamAssassin also supports: * DNS-based blacklists and DNS-based whitelists * Fuzzy-checksum-based spam detection filters such as the

Distributed Checksum Clearinghouse Distributed Checksum Clearinghouse (also referred to as DCC) is a method of spam email detection. The basic logic in DCC is that most spam mails are sent to many recipients. The same message body appearing many times is therefore bulk email. DCC id ...

Vipul's Razor
and the Cloudmark Authority plugins (commercial) *

Hashcash Hashcash is a proof-of-work system used to limit email spam and denial-of-service attacks, and more recently has become known for its use in bitcoin (and other cryptocurrencies) as part of the mining algorithm. Hashcash was proposed in 1997 by Adam ...

email stamps based on

proof-of-work Proof of work (PoW) is a form of cryptographic proof in which one party (the ''prover'') proves to others (the ''verifiers'') that a certain amount of a specific computational effort has been expended. Verifiers can subsequently confirm this exp ...

Sender Policy Framework Sender Policy Framework (SPF) is an email authentication method designed to detect forging sender addresses during the delivery of the email. SPF alone, though, is limited to detecting a forged sender claim in the envelope of the email, which is ...

and

DomainKeys Identified Mail DomainKeys Identified Mail (DKIM) is an email authentication method designed to detect forged sender addresses in email (email spoofing), a technique often used in phishing and email spam. DKIM allows the receiver to check that an email claimed ...

URI Uri may refer to: Places * Canton of Uri, a canton in Switzerland * Úri, a village and commune in Hungary * Uri, Iran, a village in East Azerbaijan Province * Uri, Jammu and Kashmir, a town in India * Uri (island), an island off Malakula Islan ...

blacklists such as

SURBL SURBL (previously stood for Spam URI RBL) is a collection of URI DNSBL lists of Uniform Resource Identifier (URI) hosts, typically web site domains, that appear in unsolicited messages. SURBL can be used to search incoming e-mail message bodies fo ...

o
URIBL
which track spam websites More methods can be added reasonably easily by writing a Perl plug-in for Apache SpamAssassin.

Bayesian filtering

Apache SpamAssassin reinforces its rules through Bayesian filtering where a user or administrator "feeds" examples of good (ham) and bad (spam) into the filter in order to learn the difference between the two. For this purpose, Apache SpamAssassin provides the command-line tool , which can be instructed to learn a single mail or an entire mailbox as either ham or spam. Typically, the user will move unrecognized spam to a separate folder, and then run on the folder of non-spam and on the folder of spam separately. Alternatively, if the mail user agent supports it, can be called for individual emails. Regardless of the method used to perform the learning, SpamAssassin's Bayesian test will help score future e-mails based on this learning to improve the accuracy.

Licensing

Apache SpamAssassin is free/

open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...

, licensed under the Apache License 2.0. Versions prior to 3.0 are dual-licensed under the

Artistic License Artistic license (alongside more contextually-specific derivative terms such as poetic license, historical license, dramatic license, and narrative license) refers to deviation from fact or form for artistic purposes. It can include the alterat ...

and the

GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...

sa-compile

sa-compile is a utility distributed with Apache SpamAssassin that compiles a SpamAssassin ruleset into a

deterministic finite automaton In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...

that allows Apache SpamAssassin to use processor power more efficiently.

Testing Apache SpamAssassin

Apache SpamAssassin is designed to trigger on the

GTUBE The GTUBE ("Generic Test for Unsolicited Bulk Email") is a 68-byte test string used to test anti-e-mail spam, spam systems, in particular those based on SpamAssassin. In SpamAssassin, it carries an antispam score of 1000 by default, which would be ...

, a 68-byte string similar to the antivirus

EICAR test file The EICAR Anti-Virus Test File or EICAR test file is a computer file that was developed by the European Institute for Computer Antivirus Research (EICAR) and Computer Antivirus Research Organization (CARO), to test the response of computer ant ...

. If this string is inserted in an RFC 5322 formatted message and passed through the Apache SpamAssassin engine, Apache SpamAssassin will trigger with a weight of 1000.

Notes

References

* *

External links

*
Apache SpamAssassin Wiki

Apache SpamAssassin Rule Updates Wiki
Automatically updating Apache SpamAssassin
KAM.cf
KAM Ruleset for Apache SpamAssassin {{DEFAULTSORT:Spamassassin

SpamAssassin Apache SpamAssassin is a computer program used for anti-spam techniques, e-mail spam filtering. It uses a variety of spam-detection techniques, including Domain Name System, DNS and fuzzy checksum techniques, Bayesian spam filtering, Bayesian filt ...

Cross-platform software Free email software Free software programmed in Perl Perl software Spam filtering Spamming Email-related software for Linux 2001 software