Duplicate Content
   HOME

TheInfoList



OR:

Duplicate content is a term used in the field of
search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of Web traffic, website traffic to a website or a web page from web search engine, search engines. SEO targets unpaid traffic (known as "natural" or "Organ ...
to describe
content Content or contents may refer to: Media * Content (media), information or experience provided to audience or end-users by publishers or media producers ** Content industry, an umbrella term that encompasses companies owning and providing mas ...
that appears on more than one web page. The duplicate content can be substantial parts of the content within or across domains and can be either exactly duplicate or closely similar. When multiple pages contain essentially the same content,
search engines A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
such as
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
and
Bing Bing most often refers to: * Bing Crosby (1903–1977), American singer * Microsoft Bing, a web search engine Bing may also refer to: Food and drink * Bing (bread), a Chinese flatbread * Bing (soft drink), a UK brand * Bing cherry, a varie ...
can penalize or cease displaying the copying site in any relevant search results.


Types


Non-malicious

Non-malicious duplicate content may include variations of the same page, such as versions optimized for normal HTML, mobile devices, or printer-friendliness, or store items that can be shown via multiple distinct URLs. Duplicate content issues can also arise when a site is accessible under multiple subdomains, such as with or without the "www." or where sites fail to handle the trailing slash of URLs correctly. Another common source of non-malicious duplicate content is
pagination Pagination, also known as paging, is the process of dividing a document into discrete page (paper), pages, either electronic pages or printed pages. In reference to books produced without a computer, pagination can mean the consecutive page num ...
, in which content and/or corresponding comments are divided into separate pages. Syndicated content is a popular form of duplicated content. If a site syndicates content from other sites, it is generally considered important to make sure that search engines can tell which version of the content is the original so that the original can get the benefits of more exposure through search engine results. Ways of doing this include having a rel=canonical tag on the syndicated page that points back to the original, NoIndexing the syndicated copy, or putting a link in the syndicated copy that leads back to the original article. If none of these solutions are implemented, the syndicated copy could be treated as the original and gain the benefits. The number of possible URLs crawled being generated by server-side software has also made it difficult for web crawlers to avoid retrieving duplicate content. Endless combinations of
HTTP GET The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
(URL-based) parameters exist, of which only a small selection will actually return unique content. For example, a simple online photo gallery may offer three options to users, as specified through
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
GET parameters in the URL. If there exist four ways to sort images, three choices of
thumbnail Thumbnails are reduced-size versions of pictures or videos, used to help in recognizing and organizing them, serving the same role for images as a normal text index does for words. In the age of digital images, visual search engines and image ...
size, two file formats, and an option to disable user-provided content, then the same set of content can be accessed with 48 different URLs, all of which may be linked on the site. This
mathematical combination In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike permutations). For example, given three fruits, say an apple, an orange and a pear, there are t ...
creates a problem for crawlers, as they must sort through endless combinations of relatively minor scripted changes in order to retrieve unique content. There may be similar content between different web pages in the form of similar product content. This is usually noticed in e-commerce websites where the usage of similar keywords for similar categories of products leads to this form of non-malicious duplicate content. This is often the case when new iterations and versions of products are released but the seller or the e-commerce website mods do not the whole product descriptions.


Malicious

Malicious duplicate content refers to content that is intentionally duplicated in an effort to manipulate search results and gain more traffic. This is known as search spam. There is a number of tools available to verify the uniqueness of the content. In certain cases, search engines penalize websites' and individual offending pages' rankings in the
search engine results page Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine in response to a keyword query. The p ...
s (SERPs) for duplicate content considered "spammy."


Detecting duplicate content


Resolutions

If the content has been copied, there are multiple resolutions available to both parties. * Get the content removed on the copier's site by contacting the owner of the duplicated content and requesting them to remove the copied content. * Hire an attorney to send a takedown notice to the copier. * Rewrite the content to make the site's content unique again. A
HTTP 301 The HTTP response status code 301 Moved Permanently is used for permanent redirecting, meaning that links or records returning this response should be updated. The new URL should be provided in the Location field, included with the response. The 3 ...
redirect (301 Moved Permanently) is a method of dealing with duplicate content to redirect users and search engine crawlers to the single pertinent version of the content.


See also

* * * *


References

{{reflist Search engine optimization