Clean URL
   HOME

TheInfoList



OR:

Clean URLs, also sometimes referred to as RESTful URLs, user-friendly URLs, pretty URLs or search engine-friendly URLs, are URLs intended to improve the
usability Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a sof ...
and
accessibility Accessibility is the design of products, devices, services, vehicles, or environments so as to be usable by people with disabilities. The concept of accessible design and practice of accessible development ensures both "direct access" (i. ...
of a
website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and W ...
or web service by being immediately and intuitively meaningful to non-expert
user Ancient Egyptian roles * User (ancient Egyptian official), an ancient Egyptian nomarch (governor) of the Eighth Dynasty * Useramen, an ancient Egyptian vizier also called "User" Other uses * User (computing), a person (or software) using an ...
s. Such URL schemes tend to reflect the conceptual structure of a collection of information and
decouple __NOTOC__ Decoupling usually refers to the ending, removal or reverse of coupling. Decoupling may also refer to: Economics * Decoupling (advertising), the purchase of services directly from suppliers rather than via an advertising agency * Dec ...
the
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine f ...
from a server's internal representation of information. Other reasons for using clean URLs include
search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic (known as "natural" or "organic" results) rather than dire ...
(SEO), conforming to the representational state transfer (REST) style of software architecture, and ensuring that individual
web resource A web resource is any identifiable resource (digital, physical, or abstract) present on or connected to the World Wide Web.< ...
s remain consistently at the same URL. This makes the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...
a more stable and useful system, and allows more durable and reliable
bookmarking Bookmarking (also "gene bookmarking" or "mitotic bookmarking") refers to a potential mechanism of transmission of gene expression programs through cell division. During mitosis, gene transcription is silenced and most transcription factors are re ...
of web resources. Clean URLs also do not contain implementation details of the underlying web application. This carries the benefit of reducing the difficulty of changing the implementation of the resource at a later date. For example, many URLs include the filename of a server-side script, such as , or . If the underlying implementation of a resource is changed, such URLs would need to change along with it. Likewise, when URLs are not "clean", if the site database is moved or restructured it has the potential to cause broken links, both internally and from external sites, the latter of which can lead to removal from
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
listings. The use of clean URLs presents a consistent location for resources to
user-agent In computing, a user agent is any software, acting on behalf of a user, which "retrieves, renders and facilitates end-user interaction with Web content". A user agent is therefore a special kind of software agent. Some prominent examples of us ...
s regardless of internal structure. A further potential benefit to the use of clean URLs is that the concealment of internal server or application information can improve the
security" \n\n\nsecurity.txt is a proposed standard for websites' security information that is meant to allow security researchers to easily report security vulnerabilities. The standard prescribes a text file called \"security.txt\" in the well known locat ...
of a system.


Structure

A URL will often comprise a path, script name, and
query string A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML, cho ...
. The query string parameters dictate the content to show on the page, and frequently include information opaque or irrelevant to users—such as internal numeric
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
s for values in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
, illegibly encoded data,
session ID In computer science, a session identifier, session ID or session token is a piece of data that is used in network communications (often over HTTP) to identify a session, a series of related message exchanges. Session identifiers become necessary ...
s, implementation details, and so on. Clean URLs, by contrast, contain only the path of a resource, in a hierarchy that reflects some logical structure that users can easily interpret and manipulate.


Implementation

The implementation of clean URLs involves
URL mapping A web framework (WF) or web application framework (WAF) is a software framework that is designed to support the development of web applications including web services, web resources, and web APIs. Web frameworks provide a standard way to build an ...
via pattern matching or transparent
rewriting In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or reduc ...
techniques. As this usually takes place on the server side, the clean URL is often the only form seen by the user. For search engine optimization purposes, web developers often take this opportunity to include relevant keywords in the URL and remove irrelevant words. Common words that are removed include
article Article often refers to: * Article (grammar), a grammatical element used to indicate definiteness or indefiniteness * Article (publishing), a piece of nonfictional prose that is an independent part of a publication Article may also refer to: ...
s and
conjunction Conjunction may refer to: * Conjunction (grammar), a part of speech * Logical conjunction, a mathematical operator ** Conjunction introduction, a rule of inference of propositional logic * Conjunction (astronomy), in which two astronomical bodies ...
s, while descriptive keywords are added to increase user-friendliness and improve search engine rankings. A
fragment identifier In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier p ...
can be included at the end of a clean URL for references within a page, and need not be user-readable.


Slug

Some systems define a ''slug'' as the part of a URL that identifies a page in
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In m ...
keywords. It is usually the end part of the URL (specifically of the path / pathinfo part), which can be interpreted as the name of the resource, similar to the
basename basename is a standard computer program on Unix and Unix-like operating systems. When basename is given a pathname, it will delete any prefix up to the last slash ('/') character and return the result. basename is described in the Single UNIX S ...
in a
filename A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths. A filename may (depending on the file system) include: * name &ndas ...
or the title of a page. The name is based on the use of the word ''
slug Slug, or land slug, is a common name for any apparently shell-less terrestrial gastropod mollusc. The word ''slug'' is also often used as part of the common name of any gastropod mollusc that has no shell, a very reduced shell, or only a ...
'' in the news media to indicate a short name given to an article for internal use. Slugs are typically generated automatically from a page title but can also be entered or altered manually, so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines, as well as providing recipients of a shared bare URL with the rough idea of the page's topic. Long page titles may also be truncated to keep the final URL to a reasonable length. Slugs may be entirely lowercase, with accented characters replaced by letters from the
Latin script The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern ...
and
whitespace character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s replaced by a
hyphen The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. ''Son-in-law'' is an example of a hyphenated word. The hyphen is sometimes confused with dashes ( figure ...
or an
underscore An underscore, ; also called an underline, low line, or low dash; is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript ...
to avoid being encoded. Punctuation marks are generally removed, and some also remove short, common words such as conjunctions. For example, the title ''This, That, and the Other! An Outré Collection'' could have a generated slug of . Another benefit of URL slugs is the facilitated ability to find a desired page out of a long list of URLs without page titles, such as a minimal list of opened tabs exported using a
browser extension A browser extension is a small software module for customizing a web browser. Browsers typically allow a variety of extensions, including user interface modifications, cookie management, ad blocking, and the custom scripting and styling of web ...
, and the ability to preview the approximate title of a target page in the browser if
hyperlink In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text w ...
ed to without title. Should a tool to save web pages locally using the string after the last slash as the default
file name A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths. A filename may (depending on the file system) include: * name &ndas ...
, like
wget GNU Wget (or just Wget, formerly Geturl, also written as its package name, wget) is a computer program that retrieves content from web servers. It is part of the GNU Project. Its name derives from "World Wide Web" and " ''get''." It supports do ...
does, a slug makes the file name more descriptive. Websites that make use of slugs include
Stack Exchange Network Stack Exchange is a network of question-and-answer (Q&A) websites on topics in diverse fields, each site covering a specific topic, where questions, answers, and users are subject to a reputation award process. The reputation system allows th ...
with question title after slash, and
Instagram Instagram is a photo and video sharing social networking service owned by American company Meta Platforms. The app allows users to upload media that can be edited with filters and organized by hashtags and geographical tagging. Posts can ...
with ?taken-by=''username'' URL parameter.


See also

*
Information architecture Information architecture (IA) is the structural design of shared information environments; the art and science of organizing and labelling websites, intranets, online communities and software to support usability and findability; and an emerging ...
* Permalink * Persistent uniform resource locator (PURL) * URL normalization *
URL redirection URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened ...
*
URL shortening URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has ...
* *
Canonical link element A canonical link element is an HTML element that helps webmasters prevent duplicate content issues in search engine optimization by specifying the "canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in Apr ...


References

{{reflist


External links


URL as UI
by
Jakob Nielsen Jacob or Jakob Nielsen may refer to: * Jacob Nielsen, Count of Halland (died c. 1309), great grandson of Valdemar II of Denmark * , Norway (1768-1822) * Jakob Nielsen (mathematician) (1890–1959), Danish mathematician known for work on automorphis ...

The User Interface of URLs

Cool URIS don't change
by
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profes ...
Search engine optimization URL