HOME

TheInfoList



OR:

Adversarial information retrieval (adversarial IR) is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation. On the Web, the predominant form of such manipulation is search engine spamming (also known as spamdexing), which involves employing various techniques to disrupt the activity of
web search engines A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a l ...
, usually for financial gain. Examples of spamdexing are link-bombing, comment or referrer spam,
spam blog A spam blog, also known as an auto blog or the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads. The purpose of a splog can be ...
s (splogs), malicious tagging.
Reverse engineering Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompli ...
of ranking algorithms, advertisement blocking,
click fraud Click, Klick and Klik may refer to: Airlines * Click Airways, a UAE airline * Clickair, a Spanish airline * MexicanaClick, a Mexican airline Art, entertainment, and media Fictional characters * Klick (fictional species), an alien race in th ...
, and
web content filtering An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software dete ...
may also be considered forms of adversarial
data manipulation Statistics, when used in a misleading fashion, can trick the casual observer into believing something other than what the data shows. That is, a misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases, the mis ...
.


Topics

Topics related to Web spam (spamdexing): *
Link spam Spamdexing (also known as search engine spam, search engine poisoning, black-hat search engine optimization, search spam or web spam) is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building ...
* Keyword spamming *
Cloaking Cloaking is a search engine optimization (SEO) technique in which the content presented to the search engine spider is different from that presented to the user's browser. This is done by delivering content based on the IP addresses or the Us ...
* Malicious tagging * Spam related to blogs, including comment spam,
splogs A spam blog, also known as an auto blog or the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads. The purpose of a splog can be ...
, and ping spam Other topics: *
Click fraud Click, Klick and Klik may refer to: Airlines * Click Airways, a UAE airline * Clickair, a Spanish airline * MexicanaClick, a Mexican airline Art, entertainment, and media Fictional characters * Klick (fictional species), an alien race in th ...
detection * Reverse engineering of
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
's
ranking A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second. In mathematics, this is known as a weak order or total preorder of o ...
algorithm * Web
content filtering An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software dete ...
* Advertisement blocking * Stealth
crawling Crawl, The Crawl, or crawling may refer to: Biology * Crawling (human), any of several types of human quadrupedal gait * Limbless locomotion, the movement of limbless animals over the ground * Undulatory locomotion, a type of motion characteri ...
*
Troll (Internet) In slang, a troll is a person who posts or makes inflammatory, insincere, digressive, extraneous, or off-topic messages online (such as in social media, a newsgroup, a forum, a chat room, a online video game), or in real life, with the in ...
* Malicious tagging or voting in
social networks A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...
*
Astroturfing Astroturfing is the practice of masking the sponsors of a message or organization (e.g., political, advertising, religious or public relations) to make it appear as though it originates from and is supported by grassroots participants. It is a ...
*
Sockpuppetry A sock puppet is defined as a person whose actions are controlled by another. It is a reference to the manipulation of a simple hand puppet made from a sock, and is often used to refer to alternative online identities or user accounts used f ...


History

The term "adversarial information retrieval" was first coined in 2000 by
Andrei Broder Andrei Zary Broder (born April 12, 1953 in Bucharest) is a distinguished scientist at Google. Previously, he was a research fellow and vice president of computational advertising for Yahoo!, and before that, the vice president of research for A ...
(then Chief Scientist at
Alta Vista AltaVista was a Web search engine established in 1995. It became one of the most-used early search engines, but lost ground to Google and was purchased by Yahoo! in 2003, which retained the brand, but based all AltaVista searches on its own sear ...
) during the Web plenary session at the
TREC TREC may refer to: * Techniques de Randonnée Équestre de Compétition or Trec, an equestrian discipline * Text Retrieval Conference, workshops co-sponsored by the National Institute of Standards and Technology (NIST) and the U.S. Department of ...
-9 conference.D. Hawking and N. Craswell (2004)
Very Large Scale Retrieval and Web Search (Preprint version)


See also

* Information retrieval *
Spamdexing Spamdexing (also known as search engine spam, search engine poisoning, black-hat search engine optimization, search spam or web spam) is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link buildin ...


References


External links


AIRWeb
series of workshops on Adversarial Information Retrieval on the Web
Web Spam Challenge
competition for researchers on Web Spam Detection
Web Spam Datasets
datasets for research on Web Spam Detection {{DEFAULTSORT:Adversarial Information Retrieval Information retrieval genres Internet fraud