HTML element
An HTML element is a type of HTML (HyperText Markup Language) document component, one of several types of HTML nodes (there are also text nodes, comment nodes and others). The first used version of HTML was written by Tim Berners-Lee in 199 ...
search engine optimization
Search engine optimization (SEO) is the process of improving the quality and quantity of Web traffic, website traffic to a website or a web page from web search engine, search engines. SEO targets unpaid search traffic (usually referred to as ...
by specifying the " canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in April 2012.
Purpose
A major problem for
search engines
Search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites have a search facility for online databases.
By content/topic
Gene ...
is to determine the original source for documents that are available on multiple URLs. Content duplication can happen in many ways, including:
* Duplication due to GET-parameters
* Duplication with multiple URLs due to CMS
* Duplication due to accessibility on different hosts/protocols
* Duplication due to print versions of websites
Duplicate content issues occur when the same content is accessible from multiple URLs. For example, would be considered by
search engines
Search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites have a search facility for online databases.
By content/topic
Gene ...
to be an entirely different page from , even though both URLs may reference the same content.
In February 2009,
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
,
Yahoo
Yahoo (, styled yahoo''!'' in its logo) is an American web portal that provides the search engine Yahoo Search and related services including My Yahoo, Yahoo Mail, Yahoo News, Yahoo Finance, Yahoo Sports, y!entertainment, yahoo!life, an ...
and
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
announced support for the canonical link element, which can be inserted into the section of a web page, to allow webmasters to prevent these issues. The canonical link element helps webmasters make clear to the search engines which page should be credited as the original.
How search engines handle rel=canonical
Search engines try to utilize canonical link definitions as an output filter for their search results. If multiple URLs contain the same content in the result set, the canonical link URL definitions will likely be incorporated to determine the original source of the content. "For example, when Google finds identical content instances, it decides to show one of them. Its choice of the resource to display in the search results will depend upon the search query."
According to Google, the canonical link element is not considered to be a directive, but rather a hint that the ranking algorithm will "honor strongly".
While the canonical link element has its benefits,
Matt Cutts
Matthew Cutts (born 1972 or 1973) is an American software engineer. Cutts is the former Administrator of the United States Digital Service. He was first appointed as acting administrator, to later be confirmed as full administrator in October 2 ...
, then the head of Google's webspam team, has said that the search engine prefers the use of 301 redirects. Cutts said the preference for redirects is because Google's spiders can choose to ignore a canonical link element if they deem it more beneficial to do so.
Factors Google considers when choosing a canonical for a page
There are multiple factors Google evaluates when determining the canonical version of a page, including:
* The canonical tag you set up: This is the most direct way to suggest the preferred URL to search engines.
* Internal linking: Pages with strong internal links pointing to them are more likely to be treated as canonical.
* Sitemap.xml: The URLs listed in the sitemap also influence Google's decision.
* Redirects: Google may choose a URL redirected to from others as the canonical version.
Implementation
Semantic tag
The canonical link element can be either used in the
semantic
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
HTML or sent with the HTTP header of a document. For non HTML documents, the HTTP header is an alternate way to set a canonical URL.
By the HTML 5 standard, the HTML element must be within the section of the document.
Self-hyperlink
Some sites such as
Stack Overflow
In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many fa ...
have on-page
hyperlink
In computing, a hyperlink, or simply a link, is a digital reference providing direct access to Data (computing), data by a user (computing), user's point and click, clicking or touchscreen, tapping. A hyperlink points to a whole document or to ...
Usability
Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a softw ...
benefits are facilitating copying the
hyperlink
In computing, a hyperlink, or simply a link, is a digital reference providing direct access to Data (computing), data by a user (computing), user's point and click, clicking or touchscreen, tapping. A hyperlink points to a whole document or to ...
target URL or title if the browser or a
browser extension
A browser extension is a software module for customizing a web browser. Browsers typically allow users to install a variety of extensions, including user interface modifications, cookie management, ad blocking, and the custom scripting and st ...
offers a "Copy link text"
context menu
A context menu (also called contextual, shortcut, and pop up or pop-up menu) is a menu in a graphical user interface (GUI) that appears upon user interaction, such as a right-click mouse operation. A context menu offers a limited set of choic ...
option for hyperlinks, the ability for the original URL to be retrieved from a saved page if not stored by the browser into a comment inside the file, as well as the ability to duplicate the opened page into a new tab right next to the currently opened one if the browser lacks such a feature.
Examples
HTML
Below is an example of HTML code that uses the inside the tag. The code could be used on a page, such as https://example.com/page.php?parameter=1to tell search engines that the https://example.com/page.php is the preferred version of the webpage.
...
URL normalization
URI normalization is the process by which Uniform Resource Identifier, URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine i ...