A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a
web resource
A web resource is any identifiable resource (digital, physical, or abstract) present on or connected to the World Wide Web.[< ...](_blank)
that specifies its location on a
computer network
A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are ...
and a mechanism for retrieving it. A URL is a specific type of
Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference
web pages (
HTTP
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
) but are also used for file transfer (
FTP
The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data ...
), email (
mailto
mailto is a Uniform Resource Identifier (URI) scheme for email addresses. It is used to produce hyperlinks on websites that allow users to send an email to a specific address directly from an HTML document, without having to copy it and entering i ...
), database access (
JDBC
Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
), and many other applications.
Most
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
s display the URL of a web page above the page in an
address bar
In a web browser, the address bar (also location bar or URL bar) is the element that shows the current URL. The user can type a URL into it to navigate to a chosen website. In most modern browsers, non-URLs are automatically sent to a search eng ...
. A typical URL could have the form
http://www.example.com/index.html
, which indicates a protocol (
http
), a
hostname
In computer networking, a hostname (archaically nodename) is a label that is assigned to a device connected to a computer network and that is used to identify the device in various forms of electronic communication, such as the World Wide Web. Hos ...
(
www.example.com
), and a file name (
index.html
).
History
Uniform Resource Locators were defined in in 1994 by
Tim Berners-Lee, the inventor of the
World Wide Web
The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet.
Documents and downloadable media are made available to the network through web ...
, and the URI working group of the
Internet Engineering Task Force
The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
(IETF), as an outcome of collaboration started at the IETF Living Documents
birds of a feather session in 1992.
The format combines the pre-existing system of
domain name
A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As ...
s (created in 1985) with
file path
A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The del ...
syntax, where
slashes are used to separate
directory
Directory may refer to:
* Directory (computing), or folder, a file system structure in which to store computer files
* Directory (OpenVMS command)
* Directory service, a software application for organizing information about a computer network's u ...
and
filenames. Conventions already existed where server names could be prefixed to complete file paths, preceded by a double slash (
//
).
Berners-Lee later expressed regret at the use of dots to separate the parts of the
domain name
A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As ...
within
URIs, wishing he had used slashes throughout, and also said that, given the colon following the first component of a URI, the two slashes before the domain name were unnecessary.
An early (1993) draft of the HTML Specification referred to "Universal" Resource Locators. This was dropped some time between June 1994 () and October 1994 (draft-ietf-uri-url-08.txt).
Syntax
Every HTTP URL conforms to the syntax of a generic URI.
A web browser will usually
dereference
In computer programming, the dereference operator or indirection operator, sometimes denoted by "*" (i.e. an asterisk), is a unary operator (i.e. one with a single operand) found in C-like languages that include pointer variables. It operates ...
a URL by performing an
HTTP
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
request to the specified host, by default on port number 80. URLs using the
https
scheme require that requests and responses be made over a
secure connection to the website.
Internationalized URL
Internet users are distributed throughout the world using a wide variety of languages and alphabets, and expect to be able to create URLs in their own local alphabets. An
Internationalized Resource Identifier (IRI) is a form of URL that includes
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
characters. All modern browsers support IRIs. The parts of the URL requiring special treatment for different alphabets are the domain name and path.
The domain name in the IRI is known as an
Internationalized Domain Name
An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-latin script or alphabet, such as Arabic, Bengali, Chinese ( Mandarin, simplif ...
(IDN). Web and Internet software automatically convert the domain name into
punycode
Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, wh ...
usable by the
Domain Name System
The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned t ...
; for example, the Chinese URL
http://例子.卷筒纸
becomes
http://xn--fsqu00a.xn--3lr804guic/
. The
xn--
indicates that the character was not originally
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
.
The URL path name can also be specified by the user in the local writing system. If not already encoded, it is converted to
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
, and any characters not part of the basic URL character set are escaped as
hexadecimal using
percent-encoding
Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI. Although it is known as ''URL encoding'', it is also used ...
; for example, the Japanese URL
http://example.com/引き割り.html
becomes
http://example.com/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html
. The target computer decodes the address and displays the page.
Protocol-relative URLs
Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example,
//example.com
will use the protocol of the current page, typically HTTP or HTTPS.
See also
*
Hyperlink
*
PURL
A persistent uniform resource locator (PURL) is a uniform resource locator (URL) (i.e., location-based uniform resource identifier or URI) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using H ...
– Persistent URL
*
CURIE
In computing, a CURIE (or ''Compact URI'') defines a generic, abbreviated syntax for expressing Uniform Resource Identifiers (URIs). It is an abbreviated URI expressed in a compact syntax, and may be found in both XML and non-XML grammars. A CURIE ...
(Compact URI)
*
Fragment identifier
*
Internet Resource Locator (IRL)
*
Internationalized resource identifier (IRI)
*
Semantic URL
Clean URLs, also sometimes referred to as RESTful URLs, user-friendly URLs, pretty URLs or search engine-friendly URLs, are URLs intended to improve the usability and accessibility of a website or web service by being immediately and intuitively ...
*
Clean URL
Clean URLs, also sometimes referred to as RESTful URLs, user-friendly URLs, pretty URLs or search engine-friendly URLs, are URLs intended to improve the usability and accessibility of a website or web service by being immediately and intuitively ...
*
Typosquatting
Typosquatting, also called URL hijacking, a sting site, or a fake URL, is a form of cybersquatting, and possibly brandjacking which relies on mistakes such as typos made by Internet users when inputting a website address into a web browser. Shoul ...
*
Uniform Resource Identifier
*
URL normalization
URI normalization is the process by which URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine if two syntactically differen ...
*
Use of slashes in networking
Notes
Citations
References
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
External links
URL specificationat
WHATWG
The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, l ...
{{Authority control
Identifiers
Computer-related introductions in 1994