HOME

TheInfoList



OR:

Semantic HTML is the use of
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
markup to reinforce the
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comp ...
, or meaning, of the information in web pages and
web application A web application (or web app) is application software that is accessed using a web browser. Web applications are delivered on the World Wide Web to users with an active network connection. History In earlier computing models like client-serv ...
s rather than merely to define its presentation or look. Semantic HTML is processed by traditional
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
s as well as by many other user agents. CSS is used to suggest its presentation to human users.


History

HTML has included semantic markup since its inception. In an HTML document, the author may, among other things, "start with a title; add headings and paragraphs; add emphasis to hetext; add images; add links to other pages; nduse various kinds of lists". Various versions of the HTML standard have included presentational markup such as <font> (added in HTML 3.2; removed in HTML 4.0 Strict), <i> (all versions) and <center> (added in HTML 3.2). There are also the semantically neutral
span and div In HTML, div and span tags are elements used to define parts of a document, so that they are identifiable when a unique classification is necessary. Where other HTML elements such as p (paragraph), em (emphasis), and so on, accurately represen ...
elements. Since the late 1990s when Cascading Style Sheets were beginning to work in most browsers, web authors have been encouraged to avoid the use of presentational HTML markup with a view to the
separation of content and presentation Separation of content and presentation (or separation of content and style) is the separation of concerns design principle as applied to the authoring and presentation of content. Under this principle, visual and design aspects (presentation and s ...
. In 2001, Tim Berners-Lee participated in a discussion of the Semantic Web, where it was presented that intelligent software 'agents' might one day automatically trawl the Web and find, filter and correlate previously unrelated, published facts for the benefit of end users. Such agents are not commonplace even now, but some of the ideas of
Web 2.0 Web 2.0 (also known as participative (or participatory) web and social web) refers to websites that emphasize user-generated content, ease of use, participatory culture and interoperability (i.e., compatibility with other products, systems, and ...
, mashups and price comparison websites may be coming close. The main difference between these web application hybrids and Berners-Lee's semantic agents lies in the fact that the current aggregation and hybridisation of information is usually designed in by web developers, who already know the web locations and the API semantics of the specific data they wish to mash, compare and combine. An important type of web agent that does crawl and read web pages automatically, without prior knowledge of what it might find, is the
Web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spid ...
or search-engine spider. These software agents are dependent on the semantic clarity of web pages they find as they use various techniques and
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s to read and index millions of web pages a day and provide web users with search facilities. In order for search-engine spiders to be able to rate the significance of pieces of text they find in HTML documents, and also for those creating mashups and other hybrids, as well as for more automated agents as they are developed, the semantic structures that exist in HTML need to be widely and uniformly applied to bring out the meaning of published information. While the true semantic web may depend on complex RDF
ontologies In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...
and metadata, every HTML document makes its contribution to the meaningfulness of the Web by the correct use of headings, lists, titles and other semantic markup wherever possible. This "plain" use of HTML has been called "Plain Old Semantic HTML" or POSH. The correct use of Web 2.0 'tagging' creates folksonomies that may be equally or even more meaningful to many.
HTML 5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML L ...
introduced new semantic elements such as section, article, footer, progress, nav, aside, mark, and time. Overall, the goal of the
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
is to slowly introduce more ways for browsers, developers, and crawlers to better distinguish between different types of data, allowing for benefits such as better display on browsers on different devices. Presentational elements were not formally
deprecated In several fields, especially computing, deprecation is the discouragement of use of some terminology, feature, design, or practice, typically because it has been superseded or is no longer considered efficient or safe, without completely removing ...
in HTML 4.01 and XHTML recommendations, but were recommended against. In HTML 5, some of those elements, such as i and b, are still specified as their meaning has been clearly defined "as to be stylistically offset from the normal prose without conveying any extra importance".


Considerations

In cases where a document requires more precise semantics than those expressed in HTML alone, fragments of the document may be enclosed within span or div elements with meaningful class names such as <span class="author"> and <div class="invoice">. Where these class names are also a fragment identifier within a schema or ontology, they may link to a more defined meaning.
Microformat Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, events ...
s formalise this approach to semantics in HTML. One important restriction of this approach is that such markup based on element inclusion must meet the well-formedness conditions. As these documents are broadly tree-structured, this means that only balanced fragments from a sub-tree can be marked up in this way. A means of marking-up any arbitrary section of HTML would require a mechanism independent of the markup structure itself, such as
XPointer XPointer is a system for addressing components of XML-based Internet media. It is divided among four specifications: a " framework" that forms the basis for identifying XML fragments, a positional element addressing scheme, a scheme for namespaces ...
. Good semantic HTML also improves the accessibility of web documents (see also
Web Content Accessibility Guidelines The Web Content Accessibility Guidelines (WCAG) are part of a series of web accessibility guidelines published by the Web Accessibility Initiative (WAI) of the World Wide Web Consortium (W3C), the main international standards organization for ...
). For example, when a screen reader or audio browser can correctly ascertain the structure of a document, it will not waste the visually impaired user's time by reading out repeated or irrelevant information when it has been marked up correctly.


Google "rich snippets"

In 2010,
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
specified three forms of structured metadata that their systems will use to find structured semantic content within webpages. Such information, when related to reviews, people profiles, business listings, and events will be used by Google to enhance the "snippet", or short piece of quoted text that is shown when the page appears in search listings. Google specifies that that data may be given using microdata,
microformat Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, events ...
s or RDFa. Microdata is specified inside itemtype and itemprop attributes added to existing HTML elements; microformat keywords are added inside class attributes as discussed above; and RDFa relies on rel,
typeof typeof, alternately also typeOf, and TypeOf, is an operator provided by several programming languages to determine the data type of a variable. This is useful when constructing programs that must accept multiple types of data without explicitly s ...
and property attributes added to existing elements.


See also

* RDFa *
Microformats Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, events ...
* Semantic Web * HTML landmarks * Semantics (computer science) *
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
*
Microdata (HTML) Microdata is a WHATWG HTML specification used to nest metadata within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience ...
*
HTML elements An HTML element is a type of HTML (HyperText Markup Language) document component, one of several types of HTML nodes (there are also text nodes, comment nodes and others). The first used version of HTML was written by Tim Berners-Lee in 1993 ...
(complete list)


References


External links


schema.org
is an initiative launched on 2 June 2011 by
Bing Bing most often refers to: * Bing Crosby (1903–1977), American singer * Microsoft Bing, a web search engine Bing may also refer to: Food and drink * Bing (bread), a Chinese flatbread * Bing (soft drink), a UK brand * Bing cherry, a varie ...
,
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
and
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Manage ...
{{DEFAULTSORT:Semantic Html Domain-specific knowledge representation languages Web accessibility Web design