HOME

TheInfoList



OR:

The noindex value of an HTML robots
meta tag Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can ...
requests that automated
Internet bots An Internet bot, web robot, robot or simply bot, is a software application that runs automated tasks (scripts) over the Internet, usually with the intent to imitate human activity on the Internet, such as messaging, on a large scale. An Internet ...
avoid indexing a web page.Robots and the META element
Official W3 specification
Reasons why one might want to use this meta tag include advising robots not to index a very large database, web pages that are very transitory, web pages that are under development, web pages that one wishes to keep slightly more private, or the printer and mobile-friendly versions of pages. Since the burden of honoring a website's noindex tag lies with the author of the search robot, sometimes these tags are ignored. Also the interpretation of the noindex tag is sometimes slightly different from one search engine company to the next.


Noindexing entire pages

Don't index this page Possible values for the meta tag content are: "none", "all", "index", "noindex", "nofollow", and "follow". A combination of the values is also possible, for example:


Bot-specific directives

The noindex directive can be restricted only to certain bots by specifying a different "name" value in the meta tag. For example, to specifically block Google's bot,Using meta tags to block access to your site
Google Webmasters Tools Help
specify: Or, to block Bing's bot, specify: Or, to block BaidUu's bot, specify:


robots.txt file

A
robots.txt The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the site they are allowed to visit. Th ...
file can be used to block crawling.


Noindexing part of a page

It is also possible to exclude part of a Web page, for example navigation text, from being indexed rather than the whole page. There are various techniques for doing this; it is possible to use several in combination. Google's main indexing spider,
Googlebot Googlebot is the web crawler software used by Google that collects documents from the web to build a searchable index for the Google Search engine. This name is actually used to refer to two different types of web crawlers: a desktop crawler (t ...
, is not known to recognize any of these techniques.


tag

The Russian search engine
Yandex Yandex LLC (russian: link=no, Яндекс, p=ˈjandəks) is a Russian multinational technology company providing Internet-related products and services, including an Internet search engine, information services, e-commerce, transportation, maps ...
introduced a new tag which prevents indexing of the content between the tags. To allow the source code to validate, alternatively can be used:

Do index this text. Don't index this text. Don't index this text.

Other indexing spiders also recognize the tag, including
Atomz WebSideStory, Inc. (later Visual Sciences), was founded by Blaise Barrelet in 1996 as web analytics tool and link directory; its products were Hitbox and HBX. The company went public on September 28, 2004 (NASDAQ: WSSI). In 2006, WebSideStory acq ...
.


microformat

There is a 2005 draft
microformat Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, events ...
s specification with the same functionality. The Robot Exclusion Profile looks for the attribute and value ''class="robots-noindex"'' in HTML tags:

Do index this text.

Don't index this text.
Don't index this text.

Don't index this text.

A combination of values is also possible, for example:
Text.


Yahoo!

In 2007,
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Man ...
introduced similar functionality to the microformat into its spider. However, Yahoo!'s spider is incompatible in that it looks for the value ''class="robots-nocontent"'' and only this value:

Do index this text.

Don't index this text.
Don't index this text.

Don't index this text.


SharePoint

SharePoint SharePoint is a web-based collaborative platform that integrates natively with Microsoft Office. Launched in 2001, SharePoint is primarily sold as a document management and storage system, but the product is highly configurable and its usage v ...
2010’s iFilter excludes content inside of a
tag with the attribute and value ''class="noindex"''. Inner
s were initially not excluded, but this may have changed. It is also unknown whether the attribute can be applied to tags other than
.

Do index this text.

Don't index this text.


Structured comments


Google Search Appliance

The
Google Search Appliance The Google Search Appliance (GSA) was a rack-mounted computer device that provided document indexing functionality. The GSA operating system was based on CentOS. The software was produced by Google and the hardware was manufactured by Dell. ...
uses structured comments:

Do index this text. Don't index this text.

Other indexing spiders also use their own structured comments.


See also

*
Nofollow nofollow is a setting on a web page hyperlink that directs search engines not to use the link for page ranking calculations. It is specified in the page as a type of link relation; that is: <a rel="nofollow" ...>. Because search engines ...
link attribute *
Robots Exclusion Standard The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the site they are allowed to visit. Th ...


References

{{Reflist Search engine optimization