Semantic URL
   HOME

TheInfoList



OR:

Clean URLs, also sometimes referred to as RESTful URLs, user-friendly URLs, pretty URLs or search engine-friendly URLs, are URLs intended to improve the usability and accessibility of a
website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and Wi ...
or web service by being immediately and intuitively meaningful to non-expert users. Such URL schemes tend to reflect the conceptual structure of a collection of information and decouple the
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
from a server's internal representation of information. Other reasons for using clean URLs include
search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic (known as "natural" or " organic" results) rather than dire ...
(SEO), conforming to the representational state transfer (REST) style of software architecture, and ensuring that individual
web resource A web resource is any identifiable resource (digital, physical, or abstract) present on or connected to the World Wide Web.< ...
s remain consistently at the same URL. This makes the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...
a more stable and useful system, and allows more durable and reliable
bookmarking Bookmarking (also "gene bookmarking" or "mitotic bookmarking") refers to a potential mechanism of transmission of gene expression programs through cell division. During mitosis, gene transcription is silenced and most transcription factors are r ...
of web resources. Clean URLs also do not contain implementation details of the underlying web application. This carries the benefit of reducing the difficulty of changing the implementation of the resource at a later date. For example, many URLs include the filename of a server-side script, such as , or . If the underlying implementation of a resource is changed, such URLs would need to change along with it. Likewise, when URLs are not "clean", if the site database is moved or restructured it has the potential to cause broken links, both internally and from external sites, the latter of which can lead to removal from search engine listings. The use of clean URLs presents a consistent location for resources to
user-agent In computing, a user agent is any software, acting on behalf of a user, which "retrieves, renders and facilitates end-user interaction with Web content". A user agent is therefore a special kind of software agent. Some prominent examples of us ...
s regardless of internal structure. A further potential benefit to the use of clean URLs is that the concealment of internal server or application information can improve the
security" \n\n\nsecurity.txt is a proposed standard for websites' security information that is meant to allow security researchers to easily report security vulnerabilities. The standard prescribes a text file called \"security.txt\" in the well known locat ...
of a system.


Structure

A URL will often comprise a
path A path is a route for physical travel – see Trail. Path or PATH may also refer to: Physical paths of different types * Bicycle path * Bridle path, used by people on horseback * Course (navigation), the intended path of a vehicle * Desire p ...
, script name, and
query string A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML, cho ...
. The query string parameters dictate the content to show on the page, and frequently include information opaque or irrelevant to users—such as internal numeric identifiers for values in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
, illegibly
encoded In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
data,
session ID In computer science, a session identifier, session ID or session token is a piece of data that is used in network communications (often over HTTP) to identify a session, a series of related message exchanges. Session identifiers become necessary ...
s, implementation details, and so on. Clean URLs, by contrast, contain only the path of a resource, in a hierarchy that reflects some logical structure that users can easily interpret and manipulate.


Implementation

The implementation of clean URLs involves
URL mapping A web framework (WF) or web application framework (WAF) is a software framework that is designed to support the development of web applications including web services, web resources, and web APIs. Web frameworks provide a standard way to build an ...
via pattern matching or transparent rewriting techniques. As this usually takes place on the server side, the clean URL is often the only form seen by the user. For search engine optimization purposes, web developers often take this opportunity to include relevant keywords in the URL and remove irrelevant words. Common words that are removed include
article Article often refers to: * Article (grammar), a grammatical element used to indicate definiteness or indefiniteness * Article (publishing), a piece of nonfictional prose that is an independent part of a publication Article may also refer to: G ...
s and
conjunction Conjunction may refer to: * Conjunction (grammar), a part of speech * Logical conjunction, a mathematical operator ** Conjunction introduction, a rule of inference of propositional logic * Conjunction (astronomy), in which two astronomical bodies ...
s, while descriptive keywords are added to increase user-friendliness and improve search engine rankings. A fragment identifier can be included at the end of a clean URL for references within a page, and need not be user-readable.


Slug

Some systems define a ''slug'' as the part of a URL that identifies a page in
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In most c ...
keywords. It is usually the end part of the URL (specifically of the
path A path is a route for physical travel – see Trail. Path or PATH may also refer to: Physical paths of different types * Bicycle path * Bridle path, used by people on horseback * Course (navigation), the intended path of a vehicle * Desire p ...
/ pathinfo part), which can be interpreted as the name of the resource, similar to the
basename basename is a standard computer program on Unix and Unix-like operating systems. When basename is given a pathname, it will delete any prefix up to the last slash ('/') character and return the result. basename is described in the Single UNIX S ...
in a filename or the title of a page. The name is based on the use of the word '' slug'' in the news media to indicate a short name given to an article for internal use. Slugs are typically generated automatically from a page title but can also be entered or altered manually, so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines, as well as providing recipients of a shared bare URL with the rough idea of the page's topic. Long page titles may also be truncated to keep the final URL to a reasonable length. Slugs may be entirely lowercase, with accented characters replaced by letters from the
Latin script The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern I ...
and
whitespace character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s replaced by a
hyphen The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. ''Son-in-law'' is an example of a hyphenated word. The hyphen is sometimes confused with dashes ( figure ...
or an
underscore An underscore, ; also called an underline, low line, or low dash; is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript a ...
to avoid being
encoded In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. For example, the title ''This, That, and the Other! An Outré Collection'' could have a generated slug of . Another benefit of URL slugs is the facilitated ability to find a desired page out of a long list of URLs without page titles, such as a minimal list of opened tabs exported using a browser extension, and the ability to preview the approximate title of a target page in the browser if hyperlinked to without title. Should a tool to save web pages locally using the string after the last slash as the default
file name A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths. A filename may (depending on the file system) include: * name &ndas ...
, like
wget GNU Wget (or just Wget, formerly Geturl, also written as its package name, wget) is a computer program that retrieves content from web servers. It is part of the GNU Project. Its name derives from "World Wide Web" and " ''get''." It supports do ...
does, a slug makes the file name more descriptive. Websites that make use of slugs include Stack Exchange Network with question title after slash, and Instagram with ?taken-by=''username'' URL parameter.


See also

*
Information architecture Information architecture (IA) is the structural design of shared information environments; the art and science of organizing and labelling websites, intranets, online communities and software to support usability and findability; and an emerging ...
*
Permalink A permalink or permanent link is a URL that is intended to remain unchanged for many years into the future, yielding a hyperlink that is less susceptible to link rot. Permalinks are often rendered simply, that is, as clean URLs, to be easier to ...
*
Persistent uniform resource locator A persistent uniform resource locator (PURL) is a uniform resource locator (URL) (i.e., location-based uniform resource identifier or URI) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using ...
(PURL) *
URL normalization URI normalization is the process by which URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine if two syntactically differen ...
*
URL redirection URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened ...
*
URL shortening URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has ...
* *
Canonical link element A canonical link element is an HTML element that helps webmasters prevent duplicate content issues in search engine optimization by specifying the "canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in Apr ...


References

{{reflist


External links


URL as UI
by
Jakob Nielsen Jacob or Jakob Nielsen may refer to: * Jacob Nielsen, Count of Halland (died c. 1309), great grandson of Valdemar II of Denmark * , Norway (1768-1822) * Jakob Nielsen (mathematician) (1890–1959), Danish mathematician known for work on automorphi ...

The User Interface of URLs

Cool URIS don't change
by Tim Berners-Lee Search engine optimization URL