Adversarial Information Retrieval

	Adversarial Information Retrieval Adversarial information retrieval (adversarial IR) is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation. On the Web, the predominant form of such manipulation is search engine spamming (also known as spamdexing), which involves employing various techniques to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are link-bombing, comment or referrer spam, spam blogs (splogs), malicious tagging. Reverse engineering of ranking algorithms, advertisement blocking, click fraud, and web content filtering may also be considered forms of adversarial data manipulation. Topics Topics related to Web spam (spamdexing): * Link spam * Keyword spammin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Information Retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; stores and manages those documents. Web search engines are the most visible IR applications. Overview An information retrieval process begins when a user or searcher enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Sping Sping is short for "spam ping", and is related to pings from blogs using trackbacks, called trackback spam. Pings are messages sent from blog and publishing tools to a centralized network service (a ping server) providing notification of newly published posts or content. Spings, or ping spam, are pings that are sent from spam blogs, or are sometimes multiple pings in a short interval from a legitimate source, often tens or hundreds per minute, due to misconfigured software, or a wish to make the content coming from the source appear fresh. Spings, like spam blogs, are increasingly problematic for the blogging community. Estimates from Weblogs.com and Matt Mullenweg's Ping-o-Matic! service have put the sping rate—the percentage of pings that are sent from spam blogs—well above 50%. A study commissioned by Ebiquity Group and conducted by the University of Maryland in 2006 confirmed that these numbers are around 75%. Since then, growth in sping has slowed, such that the portion o ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Information Retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; stores and manages those documents. Web search engines are the most visible IR applications. Overview An information retrieval process begins when a user or searcher enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Text Retrieval Conference The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or ''tracks.'' It is co-sponsored by the National Institute of Standards and Technology (NIST) and the Intelligence Advanced Research Projects Activity (part of the office of the Director of National Intelligence), and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale ''evaluation'' of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology. TREC's evaluation protocols have improved many search technologies. A 2010 study estimated that "without TREC, U.S. Internet users would have spent up to 3.15 billion additional hours using web search engines between 1999 and 2009." Hal Varian the Chief Economist at Google wrote that "The TREC data revitaliz ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Alta Vista AltaVista was a Web search engine established in 1995. It became one of the most-used early search engines, but lost ground to Google and was purchased by Yahoo! in 2003, which retained the brand, but based all AltaVista searches on its own search engine. On July 8, 2013, the service was shut down by Yahoo!, and since then the domain has redirected to Yahoo!'s own search site. Etymology The word "AltaVista" is formed from the words for "high view" or "upper view" in Spanish (alta + vista); thus, it colloquially translates to "overview". Origins AltaVista was created by researchers at Digital Equipment Corporation's Network Systems Laboratory and Western Research Laboratory who were trying to provide services to make finding files on the public network easier. Paul Flaherty came up with the original idea, along with Louis Monier and Michael Burrows, who wrote the Web crawler and indexer, respectively. The name "AltaVista" was chosen in relation to the surroundings of thei ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Andrei Broder Andrei Zary Broder (born April 12, 1953 in Bucharest) is a distinguished scientist at Google. Previously, he was a research fellow and vice president of computational advertising for Yahoo!, and before that, the vice president of research for AltaVista. He has also worked for IBM Research as a distinguished engineer and was CTO of IBM's Institute for Search and Text Analysis. Education and career Broder was born in Bucharest, Romania, in 1953. His parents were medical doctors, his father a noted oncological surgeon. They emigrated to Israel in 1973, when Broder was in the second year of college in Romania, in the Electronics department at the Bucharest Polytechnic. He was accepted at Technion – Israel Institute of Technology, in the EE Department. Broder graduated from Technion in 1977, with a B.Sc. summa cum laude. He was then admitted to the PhD program at Stanford, where he initially planned to work in the systems area. His first adviser was Prof. John L. Hennessy. After ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Sockpuppetry A sock puppet is defined as a person whose actions are controlled by another. It is a reference to the manipulation of a simple sock puppet, hand puppet made from a sock, and is often used to refer to alternative online identity, online identities or User (computing), user accounts used for purposes of deception. Online, it came to be used to refer to a false identity assumed by a member of an internet community who spoke to, or about, themselves while pretending to be another person. The use of the term has expanded to now include other misleading uses of online identities, such as those created to praise, defend, or support a person or organization, Internet manipulation, to manipulate public opinion, or to circumvent restrictions, such as viewing a social media account that they are blocked from, suspension, or an outright ban from a website. A significant difference between a pseudonym and a sock puppet is that the latter poses as a third party independent of the main accou ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Astroturfing Astroturfing is the practice of masking the sponsors of a message or organization (e.g., political, advertising, religious or public relations) to make it appear as though it originates from and is supported by grassroots participants. It is a practice intended to give the statements or organizations credibility by withholding information about the source's financial connection. The term ''astroturfing'' is derived from AstroTurf, a brand of synthetic carpeting designed to resemble natural grass, as a play on the word "grassroots". The implication behind the use of the term is that instead of a "true" or "natural" grassroots effort behind the activity in question, there is a "fake" or "artificial" appearance of support. Definition In political science, it is defined as the process of seeking electoral victory or legislative relief for grievances by helping political actors find and mobilize a sympathetic public, and is designed to create the image of public consensus where ther ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Social Networks A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for analyzing the structure of whole social entities as well as a variety of theories explaining the patterns observed in these structures. The study of these structures uses social network analysis to identify local and global patterns, locate influential entities, and examine network dynamics. Social networks and the analysis of them is an inherently interdisciplinary academic field which emerged from social psychology, sociology, statistics, and graph theory. Georg Simmel authored early structural theories in sociology emphasizing the dynamics of triads and "web of group affiliations". Jacob Moreno is credited with developing the first sociograms in the 1930s to study interpersonal relationships. These approaches were mathematically ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Troll (Internet) In slang, a troll is a person who posts or makes inflammatory, insincere, digressive, extraneous, or off-topic messages online (such as in social media, a newsgroup, a forum, a chat room, a online video game), or in real life, with the intent of provoking others into displaying emotional responses, or manipulating others' perception. The behavior is typically for the troll's amusement, or to achieve a specific result such as disrupting a rival's online activities or purposefully causing confusion or harm to other users online. In this context, both the noun and the verb forms of "troll" are frequently associated with Internet discourse. Media attention in recent years has equated trolling with online harassment. ''The Courier-Mail'' and ''The Today Show'' have used "troll" to mean "a person who defaces Internet tribute sites with the aim of causing grief to families". In addition, depictions of trolling have been included in popular fictional works, such as the HBO tele ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Web Crawling A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spidering''). Web search engines and some other websites use Web crawling or spidering software to update their web content or indices of other sites' web content. Web crawlers copy pages for processing by a search engine, which indexes the downloaded pages so that users can search more efficiently. Crawlers consume resources on visited systems and often visit sites unprompted. Issues of schedule, load, and "politeness" come into play when large collections of pages are accessed. Mechanisms exist for public sites not wishing to be crawled to make this known to the crawling agent. For example, including a robots.txt file can request bots to index only parts of a website, or nothing at all. The number of Internet pages is extremely l ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Content Filtering An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software determines what content will be available or be blocked. Such restrictions can be applied at various levels: a government can attempt to apply them nationwide (see Internet censorship), or they can, for example, be applied by an Internet service provider to its clients, by an employer to its personnel, by a school to its students, by a library to its visitors, by a parent to a child's computer, or by an individual users to their own computers. The motive is often to prevent access to content which the computer's owner(s) or other authorities may consider objectionable. When imposed without the consent of the user, content control can be characterised as a form of internet censorship. Some content-control software includes time control func ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]