Clean URLs (also known as user-friendly URLs, pretty URLs, search-engine–friendly URLs or RESTful URLs) are web addresses or
Uniform Resource Locator
A uniform resource locator (URL), colloquially known as an address on the World Wide Web, Web, is a reference to a web resource, resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific t ...
s (URLs) intended to improve the
usability
Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a softw ...
and
accessibility of a
website
A website (also written as a web site) is any web page whose content is identified by a common domain name and is published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, educatio ...
,
web application
A web application (or web app) is application software that is created with web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to dynamically build a response to the request, ...
, or
web service
A web service (WS) is either:
* a service offered by an electronic device to another electronic device, communicating with each other via the Internet, or
* a server running on a computer device, listening for requests at a particular port over a n ...
by being immediately and intuitively meaningful to non-expert
user
Ancient Egyptian roles
* User (ancient Egyptian official), an ancient Egyptian nomarch (governor) of the Eighth Dynasty
* Useramen, an ancient Egyptian vizier also called "User"
Other uses
* User (computing), a person (or software) using an ...
s. Such URL schemes tend to reflect the conceptual structure of a collection of information and
decouple the
user interface
In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine fro ...
from a server's internal representation of information. Other reasons for using clean URLs include
search engine optimization
Search engine optimization (SEO) is the process of improving the quality and quantity of Web traffic, website traffic to a website or a web page from web search engine, search engines. SEO targets unpaid search traffic (usually referred to as ...
(SEO),
conforming to the
representational state transfer
REST (Representational State Transfer) is a software architectural style that was created to describe the design and guide the development of the architecture for the World Wide Web. REST defines a set of constraints for how the architecture of ...
(REST) style of software architecture, and ensuring that individual
web resources remain consistently at the same URL. This makes the
World Wide Web
The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
a more stable and useful system, and allows more durable and reliable
bookmarking of web resources.
Clean URLs also do not contain implementation details of the underlying web application. This carries the benefit of reducing the difficulty of changing the implementation of the resource at a later date. For example, many URLs include the filename of a
server-side script, such as , or . If the underlying implementation of a resource is changed, such URLs would need to change along with it. Likewise, when URLs are not "clean", if the site database is moved or restructured it has the potential to cause
broken links, both internally and from external sites, the latter of which can lead to removal from
search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
listings. The use of clean URLs presents a consistent location for resources to
user agent
On the Web, a user agent is a software agent responsible for retrieving and facilitating end-user interaction with Web content. This includes all web browsers, such as Google Chrome and Safari
A safari (; originally ) is an overland jour ...
s regardless of internal structure. A further potential benefit to the use of clean URLs is that the concealment of internal server or application information can improve the
security
Security is protection from, or resilience against, potential harm (or other unwanted coercion). Beneficiaries (technically referents) of security may be persons and social groups, objects and institutions, ecosystems, or any other entity or ...
of a system.
Structure
A URL will often comprise a
path
A path is a route for physical travel – see Trail.
Path or PATH may also refer to:
Physical paths of different types
* Bicycle path
* Bridle path, used by people on horseback
* Course (navigation), the intended path of a vehicle
* Desir ...
, script name, and
query string
A query string is a part of a uniform resource locator ( URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML doc ...
. The query string parameters dictate the content to show on the page, and frequently include information opaque or irrelevant to users—such as internal numeric
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
s for values in a
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
, illegibly
encoded
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
data,
session ID
In computer science, a session identifier, session ID or session token is a piece of data that is used in network communications (often over HTTPS) to identify a session, a series of related message exchanges. Session identifiers become necessar ...
s, implementation details, and so on. Clean URLs, by contrast, contain only the path of a resource, in a hierarchy that reflects some logical structure that users can easily interpret and manipulate.
Implementation
The implementation of clean URLs involves
URL mapping via pattern matching or transparent
rewriting
In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or reduc ...
techniques. As this usually takes place on the server side, the clean URL is often the only form seen by the user.
For search engine optimization purposes, web developers often take this opportunity to include relevant keywords in the URL and remove irrelevant words. Common words that are removed include
articles and
conjunctions, while descriptive keywords are added to increase user-friendliness and improve search engine rankings.
A
fragment identifier
In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier poi ...
can be included at the end of a clean URL for references within a page, and need not be user-readable.
Slug
The name ''
slug
Slug, or land slug, is a common name for any apparently shell-less Terrestrial mollusc, terrestrial gastropod mollusc. The word ''slug'' is also often used as part of the common name of any gastropod mollusc that has no shell, a very reduced ...
'' is based on the use of ''slug'' by the news media to indicate a short name given to an article for internal use. Some systems define a ''slug'' as the part of a URL that identifies a page in
human-readable keywords, while others use a broader definition emphasizing that legible slugs are more user-friendly. It is usually the end part of the URL (specifically of the
path
A path is a route for physical travel – see Trail.
Path or PATH may also refer to:
Physical paths of different types
* Bicycle path
* Bridle path, used by people on horseback
* Course (navigation), the intended path of a vehicle
* Desir ...
/
pathinfo part), which can be interpreted as the name of the resource, similar to the
basename in a
filename
A filename or file name is a name used to uniquely identify a computer file in a file system. Different file systems impose different restrictions on filename lengths.
A filename may (depending on the file system) include:
* name – base ...
or the title of a page.
Slugs are typically generated automatically from a page title but can also be entered or altered manually, so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines, as well as providing recipients of a shared bare URL with a rough idea of the page's topic. Long page titles may also be truncated to keep the final URL to a reasonable length.
Slugs may be entirely lowercase, with accented characters replaced by letters from the
Latin script
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
and
whitespace character
A whitespace character is a character data element that represents white space when text is
rendered for display by a computer.
For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western scrip ...
s replaced by a
hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
The hyphen is sometimes confused with dashes (en dash , em dash and others), which are wider, or with t ...
or an
underscore
An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its ...
to avoid being
encoded
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
. Punctuation marks are generally removed, and some also remove short, common words such as
conjunctions. For example, the title ''This, That, and the Other! An Outré Collection'' could have a generated slug of .
Another benefit of URL slugs is the facilitated ability to find a desired page from a long list of URLs without page titles, such as a minimal list of opened
tabs exported using a
browser extension
A browser extension is a software module for customizing a web browser. Browsers typically allow users to install a variety of extensions, including user interface modifications, cookie management, ad blocking, and the custom scripting and st ...
, and the ability to preview the approximate title of a target page in the browser if
hyperlink
In computing, a hyperlink, or simply a link, is a digital reference providing direct access to Data (computing), data by a user (computing), user's point and click, clicking or touchscreen, tapping. A hyperlink points to a whole document or to ...
ed without title.
If a tool to save web pages locally uses the string after the last slash as the default
file name, like
wget does, a slug makes the file name more descriptive.
Websites that make use of slugs include
Stack Exchange Network with question title after slash, and
Instagram
Instagram is an American photo sharing, photo and Short-form content, short-form video sharing social networking service owned by Meta Platforms. It allows users to upload media that can be edited with Social media camera filter, filters, be ...
with
?taken-by=''username''
URL parameter.
See also
*
Information architecture
Information architecture (IA) is the structural design of shared information environments; the art and science of organizing and labelling websites, intranets, online communities and software to support usability and findability; and an emerging ...
*
Permalink
*
Persistent uniform resource locator (PURL)
*
URL normalization
URI normalization is the process by which Uniform Resource Identifier, URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine i ...
*
URL redirection
URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. ...
*
URL shortening
URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has a ...
*
*
Canonical link element
A canonical link element is an HTML element that helps webmasters prevent duplicate content issues in search engine optimization by specifying the " canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in ...
Notes
/>
References
{{reflist
External links
URL as UI
by Jakob Nielsen
The User Interface of URLs
Cool URIs don't change
by Tim Berners-Lee
Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web, the HTML markup language, the URL system, and HTTP. He is a professorial research fellow a ...
Search engine optimization
URL