HOME

TheInfoList




A Uniform Resource Locator (URL), colloquially termed a web address, is a reference to a web resource that specifies its location on a
computer network A computer network is a set of computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operati ...
and a mechanism for retrieving it. A URL is a specific type of
Uniform Resource Identifier A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, conce ...
(URI), although many people use the two terms interchangeably. URLs occur most commonly to reference
web page A web page (or webpage) is a hypertext Hypertext is text displayed on a or other with references () to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically acti ...

web page
s

) but are also used for file transfer ( File Transfer Protocol, ftp), email (
mailto mailto is a Uniform Resource Identifier (URI) scheme for email addresses. It is used to produce hyperlinks on websites that allow users to send an email to a specific address directly from an HTML document, without having to copy it and entering it ...
), database access (
JDBC Java Database Connectivity (JDBC) is an application programming interface In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorit ...
), and many other applications. Most
web browser A web browser (commonly referred to as a browser) is application software for accessing the World Wide Web. When a User (computing), user requests a web page from a particular website, the web browser retrieves the necessary content from a web ...

web browser
s display the URL of a web page above the page in an
address bar In a web browser, the address bar (also location bar or URL bar) is a Widget (GUI), GUI widget that shows the current Uniform Resource Locator, URL. The user can type a URL into the bar to navigate to a chosen website; in most modern browsers, n ...
. A typical URL could have the form http://www.example.com/index.html, which indicates a protocol (http), a
hostname In computer networking, a hostname (archaically nodename) is a label that is assigned to a device connected to a computer network A computer network is a group of computers that use a set of common communication protocols over digital signal ...
(www.example.com), and a file name (index.html).


History

Uniform Resource Locators were defined in in 1994 by
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system ...

Tim Berners-Lee
, the inventor of the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system An information system (IS) is a formal, sociotechnical Sociotechnical systems (STS) in organizational development is an approach to complex organizational ...
, and the URI working group of the
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is an open standards organization, which develops and promotes voluntary Internet standards, in particular the technical standards that comprise the Internet protocol suite (TCP/IP). It has no formal ...
(IETF), as an outcome of collaboration started at the IETF Living Documents
birds of a feather ''Birds of a Feather'' (commonly abbreviated to BOAF) is a British sitcom originally broadcast on BBC One, BBC 1 from 16 October 1989 to 24 December 1998, then revived on ITV (TV network), ITV from 2 January 2014. Originally starring Pauline Q ...
session in 1992. The format combines the pre-existing system of
domain name A domain name is an identification string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Film ...
s (created in 1985) with file path syntax, where slashes are used to separate
directory Directory may refer to: * Directory (computing) In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and developmen ...
and
filename A filename or file name is a name used to uniquely identify a computer file A computer file is a computer resource for recording data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more techni ...

filename
s. Conventions already existed where server names could be prefixed to complete file paths, preceded by a double slash (//). Berners-Lee later expressed regret at the use of dots to separate the parts of the
domain name A domain name is an identification string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Film ...
within
URI A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, concep ...
s, wishing he had used slashes throughout, and also said that, given the colon following the first component of a URI, the two slashes before the domain name were unnecessary. An early (1993) draft of the HTML Specification referred to "Universal" Resource Locators. This was dropped some time between June 1994 () and October 1994 (draft-ietf-uri-url-08.txt).


Syntax

Every HTTP URL conforms to the syntax of a generic URI. A web browser will usually
dereference The dereference operator or indirection operator, sometimes denoted by "*" (i.e. an asterisk), is a unary operator (i.e. one with a single operand) found in C-like languages that include pointer variables. It operates on a pointer variable, an ...

dereference
a URL by performing an
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer An application layer is an abstraction layer that specifies the shared communications protocols and Interface (computing), interface methods used by Host (network), hosts in a c ...
request to the specified host, by default on port number 80. URLs using the https scheme require that requests and responses be made over a secure connection to the website.


Internationalized URL

Internet users are distributed throughout the world using a wide variety of languages and alphabets and expect to be able to create URLs in their own local alphabets. An Internationalized Resource Identifier (IRI) is a form of URL that includes Unicode characters. All modern browsers support IRIs. The parts of the URL requiring special treatment for different alphabets are the domain name and path. The domain name in the IRI is known as an
Internationalized Domain Name An internationalized domain name (IDN) is an Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is ...
(IDN). Web and Internet software automatically convert the domain name into
punycode Punycode is a representation of Unicode Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), ...
usable by the Domain Name System; for example, the Chinese URL http://例子.卷筒纸 becomes http://xn--fsqu00a.xn--3lr804guic/. The xn-- indicates that the character was not originally ASCII. The URL path name can also be specified by the user in the local writing system. If not already encoded, it is converted to
UTF-8 UTF-8 is a variable-width character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be ...
, and any characters not part of the basic URL character set are escaped as
hexadecimal In mathematics and computing, the hexadecimal (also base 16 or hex) numeral system is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system repres ...
using
percent-encoding Percent-encoding, also known as URL encoding, is a method to encode The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims to identify functional elements in the human genome The human genome is a complete set of nu ...
; for example, the Japanese URL http://example.com/引き割り.html becomes http://example.com/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html. The target computer decodes the address and displays the page.


Protocol-relative URLs

Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example, //example.com will use the protocol of the current page, typically HTTP or HTTPS.


See also

*
Hyperlink In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and so ...

Hyperlink
*
PURL Purl or wormwood ale is an English drink. It was originally made by infusing ale Ale is a type Type may refer to: Science and technology Computing * Typing, producing text via a keyboard, typewriter, etc. * Data type, collection of values u ...
– Persistent URL *
CURIE In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softwa ...
(Compact URI) *
Fragment identifier In computer hypertext File:Douglas Engelbart in 2008.jpg, Douglas Engelbart in 2009, at the 40th anniversary celebrations of "The Mother of All Demos" in San Francisco, a 90-minute 1968 presentation of the NLS (computer system), NLS computer ...
* Internet Resource Locator (IRL) *
Internationalized resource identifier The Internationalized Resource Identifier (IRI) is an internet protocol standard which builds on the Uniform Resource Identifier A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical res ...
(IRI) * Semantic URL *
Typosquatting Typosquatting, also called URL hijacking, a sting site, or a fake URL, is a form of cybersquatting, and possibly brandjacking which relies on mistakes such as typographical error, typos made by Internet users when inputting a URL, website address ...

Typosquatting
*
Uniform Resource Identifier A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, conce ...
*
URL normalization URI normalization is the process by which Uniform Resource Identifier, URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine if ...
*
Use of slashes in networking
Use of slashes in networking


Notes


Citations


References

* * * * * * * * * * * * * *


External links


URL specification
at
WHATWG The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, lea ...

The Components of a URL
from
IBM International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, with operations in over 170 countries. The company began in 1911, founded in Endicott, New York, as the C ...

IBM
{{Authority control Identifiers Computer-related introductions in 1994