HOME

TheInfoList



OR:

Spam mass is defined as "the measure of the impact of link spamming on a page's ranking." The concept was developed by Zoltán Gyöngyi and
Hector Garcia-Molina In Greek mythology, Hector (; grc, Ἕκτωρ, Hektōr, label=none, ) is a character in Homer's Iliad. He was a Trojan prince and the greatest warrior for Troy during the Trojan War. Hector led the Trojans and their allies in the defense o ...
of
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
in association with Pavel Berkhin and Jan Pedersen of
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Man ...
. This paper expands upon their proposed
TrustRank TrustRank is an algorithm that conducts link analysis to separate useful webpages from spam and helps search engine rank pages in SERPs (Search Engine Results Pages). It is semi-automated process which means that it needs some human assistance i ...
methodology. The researchers developed a ''good core'' and a ''bad core'' of selected
Web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
documents, from which they measured spam mass across a collection of documents. Two types of measurements, ''absolute mass'' and ''relative mass'', are used to compare groups of documents. The higher the mass measurements, the more likely the documents are to be equivalent to spam.


Thresholds

A threshold value is used to identify groups of documents as spam. If their relative mass value exceeds the threshold, the documents are considered to be spam. A second threshold for the
PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. According ...
values of the selected documents is applied. Only high PageRank documents are labelled as spam. The purpose of the methodology is to identify spam documents with artificially inflated PageRank values.


External links

* {{cite web, url= http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=2005-33&format=pdf&compression=&name=2005-33.pdf , title=Link Spam Detection Based on Mass Estimation


References

Black hat search engine optimization Spamming