Bayesian Poisoning
   HOME
*





Bayesian Poisoning
Bayesian poisoning is a technique used by e-mail spammers to attempt to degrade the effectiveness of spam filters that rely on Bayesian spam filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam. The spammer hopes that the addition of random (or even carefully selected) words that are unlikely to appear in a spam message will cause the spam filter to believe the message to be legitimate—a statistical type II error. Spammers also hope to cause the spam filter to have a higher false positive rate by turning previously innocent words into spammy words in the Bayesian database (statistical type I errors) because a user who trains their spam filter on a poisoned message will be indicating to the filter that the words added by the spammer are a good indication of spam. Empirical results Graham-Cumming At the Spam Conference held at MIT in 2004 John Graham-Cumming presented two possible attacks on POPFile's Bayesian eng ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Spam (electronic)
Spamming is the use of messaging systems to send multiple unsolicited messages (spam) to large numbers of recipients for the purpose of commercial advertising, for the purpose of non-commercial proselytizing, for any prohibited purpose (especially the fraudulent purpose of phishing), or simply repeatedly sending the same message to the same user. While the most widely recognized form of spam is email spam, the term is applied to similar abuses in other media: instant messaging spam, Usenet newsgroup spam, Web search engine spam, spam in blogs, wiki spam, online classified ads spam, mobile phone messaging spam, Internet forum spam, junk fax transmissions, social spam, spam mobile apps, television advertising and file sharing spam. It is named after Spam, a luncheon meat, by way of a Monty Python sketch about a restaurant that has Spam in almost every dish in which Vikings annoyingly sing "Spam" repeatedly. Spamming remains economically viable because advertisers have no ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Spam Filter
Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly applying anti-spam techniques. Filtering can be applied to incoming emails as well as to outgoing ones. Depending on the calling environment, email filtering software can reject an item at the initial SMTP connection stage or pass it through unchanged for delivery to the user's mailbox. It is also possible to redirect the message for delivery elsewhere, quarantine it for further checking, modify it or 'tag' it in any other way. Motivation Common uses for mail filters include organizing incoming email and removal of spam and computer viruses. Mailbox providers filter outgoing email to promptly react to spam surges that may result from compromised accounts. A less common use is to inspect outgoing email at some companies to ensure that emplo ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bayesian Spam Filtering
Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification. Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes' theorem to calculate a probability that an email is or is not spam. Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s. History Bayesian algorithms were used for email filtering as early as 1996. Although naive Bayesian filters did not become popular until later, multiple programs were released in 1998 to address the growing problem of unwanted email. The first scholarly publi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Bayesian Probability
Bayesian probability is an Probability interpretations, interpretation of the concept of probability, in which, instead of frequentist probability, frequency or propensity probability, propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief. The Bayesian interpretation of probability can be seen as an extension of propositional logic that enables reasoning with Hypothesis, hypotheses; that is, with propositions whose truth value, truth or falsity is unknown. In the Bayesian view, a probability is assigned to a hypothesis, whereas under frequentist inference, a hypothesis is typically tested without being assigned a probability. Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies a prior probability. This, in turn, is then updated to a posterior probability in the light of new, re ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Type II Error
In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the failure to reject a null hypothesis that is actually false (also known as a "false negative" finding or conclusion; example: "a guilty person is not convicted"). Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossibility if the outcome is not determined by a known, observable causal process. By selecting a low threshold (cut-off) value and modifying the alpha (α) level, the quality of the hypothesis test can be increased. The knowledge of type I errors and type II errors is widely used in medical science, biometrics and computer science. Intuitively, type I errors can be thought of as errors of ''commission'', i.e. the researcher unluck ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Type I Error
In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the failure to reject a null hypothesis that is actually false (also known as a "false negative" finding or conclusion; example: "a guilty person is not convicted"). Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossibility if the outcome is not determined by a known, observable causal process. By selecting a low threshold (cut-off) value and modifying the alpha (α) level, the quality of the hypothesis test can be increased. The knowledge of type I errors and type II errors is widely used in medical science, biometrics and computer science. Intuitively, type I errors can be thought of as errors of ''commission'', i.e. the researcher unluck ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


POPFile
POPFile is a free, open-source, cross-platform mail filter originally written in Perl by John Graham-Cumming and maintained by a team of volunteers. It uses a naive Bayes classifier to filter mail. This allows the filter to "learn" and classify mail according to the user's preferences. Typically it is used to filter spam mail. It can also be used to sort mail into other user defined "buckets" or categories - for example, the user may define a bucket into which work email is sorted. The program works in several different modes. In the most popular mode, it sets itself up as a proxy between the email client and the POP3 server. As mail is downloaded via POP3, the filter identifies and classifies mail and makes a user defined modification to the subject line, appending the name of the appropriate bucket. The user then sets up rules in the mail client to sort the mail based on the subject line modification. An HTML based interface can be used to instruct POPFile, allowing users to ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Web Bug
A web beaconAlso called web bug, tracking bug, tag, web tag, page tag, tracking pixel, pixel tag, 1×1 GIF, or clear GIF. is a technique used on web pages and email to unobtrusively (usually invisibly) allow checking that a user has accessed some content. Web beacons are typically used by third parties to monitor the activity of users at a website for the purpose of web analytics or page tagging. They can also be used for email tracking. When implemented using JavaScript, they may be called JavaScript tags. Using such beacons, companies and organizations can track the online behaviour of web users. At first, the companies doing such tracking were mainly advertisers or web analytics companies; later social media sites also started to use such tracking techniques, for instance through the use of buttons that act as tracking beacons. In 2017, W3C published a candidate specification for an interface that web developers can use to create web beacons. Overview A web beacon is a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Web Beacon
A web beaconAlso called web bug, tracking bug, tag, web tag, page tag, tracking pixel, pixel tag, 1×1 GIF, or clear GIF. is a technique used on web pages and email to unobtrusively (usually invisibly) allow checking that a user has accessed some content. Web beacons are typically used by third parties to monitor the activity of users at a website for the purpose of web analytics or page tagging. They can also be used for email tracking. When implemented using JavaScript, they may be called JavaScript tags. Using such beacons, companies and organizations can track the online behaviour of web users. At first, the companies doing such tracking were mainly advertisers or web analytics companies; later social media sites also started to use such tracking techniques, for instance through the use of buttons that act as tracking beacons. In 2017, W3C published a candidate specification for an interface that web developers can use to create web beacons. Overview A web beacon is a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


CRM114 (program)
CRM114 (full name: "The CRM114 Discriminator") is a program based upon a statistical approach for classifying data, and especially used for filtering email spam. Origin of the name The name comes from the CRM-114 Discriminator in the Stanley Kubrick movie Dr. Strangelove - a piece of radio equipment designed to filter out messages lacking a specific code-prefix. Operation While others have done statistical Bayesian spam filtering based upon the frequency of single word occurrences in email, CRM114 achieves a higher rate of spam recognition through creating hits based upon phrases up to five words in length. These phrases are used to form a Markov Random Field representing the incoming texts. With this additional contextual recognition, it is one of the more accurate spam filters available. Initial testing in 2002 by author Bill Yerazunis gave a 99.87% accuracy; Holden and TREC 2005 and 2006
[...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




SpamBayes
SpamBayes is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay "A Plan for Spam". It has subsequently been improved by Gary Robinson and Tim Peters, among others. The most notable difference between a conventional Bayesian filter and the filter used by SpamBayes is that there are three classifications rather than two: spam, non-spam (called ''ham'' in SpamBayes), and unsure. The user trains a message as being either ham or spam; when filtering a message, the spam filters generate one score for ham and another for spam. If the spam score is high and the ham score is low, the message will be classified as spam. If the spam score is low and the ham score is high, the message will be classified as ham. If the scores are both high or both low, the message will be classified as unsure. This approach leads to a low number of false positives and false negative A false positive is an error in binary classification in which a test re ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Dalhousie University
Dalhousie University (commonly known as Dal) is a large public research university in Nova Scotia Nova Scotia ( ; ; ) is one of the thirteen provinces and territories of Canada. It is one of the three Maritime provinces and one of the four Atlantic provinces. Nova Scotia is Latin for "New Scotland". Most of the population are native Eng ..., Canada, with three campuses in Halifax, a fourth in Bible Hill, Nova Scotia, Bible Hill, and a second medical school campus in Saint John, New Brunswick. Dalhousie offers more than 4,000 courses, and over 200 degree programs in 13 undergraduate, graduate, and professional faculties. The university is a member of the U15 Group of Canadian Research Universities, U15, a group of research-intensive universities in Canada. The institution was established as ''Dalhousie College'', a nonsectarian institution established in 1818 by the eponymous Lieutenant Governor of Nova Scotia, George Ramsay, 9th Earl of Dalhousie, with education reforme ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]