In the
Standard Generalized Markup Language
The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
* Declarative: Markup should de ...
(SGML), an entity is a
primitive data type
In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
, which associates a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
with either a unique alias (such as a user-specified name) or an SGML
reserved word
In a programming language, a reserved word (sometimes known as a reserved identifier) is a word that cannot be used by a programmer as an identifier, such as the name of a variable, function, or label – it is "reserved from use". In brief, an '' ...
(such as
#DEFAULT
). Entities are foundational to the organizational structure and definition of SGML documents. The SGML specification defines numerous entity types, which are distinguished by keyword qualifiers and context. An entity string value may variously consist of
plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a lim ...
, SGML tags, and/or references to previously defined entities. Certain entity types may also invoke external documents. Entities are
called by reference.
Entity types
Entities are classified as general or parameter:
* A ''general'' entity can only be referenced within the document content.
* A ''parameter'' entity can only be referenced within the
document type definition (DTD).
Entities are also further classified as parsed or unparsed:
* A ''parsed'' entity contains text, which will be incorporated into the document and parsed if the entity is referenced. A parameter entity can only be a parsed entity.
* An ''unparsed'' entity contains any kind of data, and a reference to it will result in the application's merely being notified of the entity's presence; the content of the entity will not be parsed, even if it is text. An unparsed entity can only be external.
Internal and external entities
An internal entity has a value that is either a
literal string, or a parsed string comprising markup and entities defined in the same document (such as a
Document Type Declaration or subdocument). In contrast, an external entity has a
declaration that invokes an external document, thereby necessitating the intervention of an
entity manager to resolve the external document reference.
System entities
An entity declaration may have a literal value, or may have some combination of an optional
SYSTEM
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
, which allows SGML parsers to process an entity's string referent as a resource identifier, and an optional
PUBLIC
identifier, which identifies the entity independent of any particular representation. In
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
, a subset of
SGML
The Standard Generalized Markup Language (SGML; International Organization for Standardization, ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on t ...
, an entity declaration may not have a
PUBLIC
identifier without a
SYSTEM
identifier.
SGML document entity
When an external entity references a complete SGML document, it is known in the calling document as an SGML document entity. An SGML document is a text document with SGML markup defined in an SGML prologue (i.e., the DTD and subdocuments). A complete SGML document comprises not only the document instance itself, but also the prologue and, optionally, the SGML declaration (which defines the document's markup syntax and declares the
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
).
Syntax
An entity is defined via an ''entity declaration'' in a document's
document type definition (DTD). For example:
This DTD markup declares the following:
* An internal general entity named
greeting1
exists and consists of the string
Hello world
.
* An external general entity named
greeting2
exists and consists of the text found in the resource identified by the
URI file:///hello.txt
.
* An internal parameter entity named
greeting3
exists and consists of the string
¡Hola!
.
* An internal general entity named
greeting4
exists and consists of the string
¡Hola! means Hello!
.
Names for entities must follow the rules for
SGML names, and there are limitations on where entities can be referenced.
Parameter entities are referenced by placing the entity name between
%
and
;
. Parsed general entities are referenced by placing the entity name between "
&
" and "
;
". Unparsed entities are referenced by placing the entity name in the value of an attribute declared as type ENTITY.
The general entities from the example above might be referenced in a document as follows:
'&greeting1;' is a common test string.
The content of hello.txt is: &greeting2;
In Spanish, &greeting4;
When parsed, this document would be reported to the downstream application the same as if it has been written as follows, assuming the
hello.txt
file contains the text
Salutations
:
'Hello world' is a common test string.
The content of hello.txt is: Salutations
In Spanish, ¡Hola! means Hello!
A reference to an undeclared entity is an error unless a default entity has been defined. For example:
Additional markup constructs and processor options may affect whether and how entities are processed. For example, a processor may optionally ignore external entities.
Character entities
Standard entity sets for SGML and some of its derivatives have been developed as
mnemonic
A mnemonic device ( ), memory trick or memory device is any learning technique that aids information retention or retrieval in the human memory, often by associating the information with something that is easier to remember.
It makes use of e ...
devices, to ease document authoring when there is a need to use characters that are not easily typed or that are not widely supported by legacy character encodings. Each such entity consists of just one character from the
Universal Character Set
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/ IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), w ...
. Although any character can be referenced using a
numeric character reference
A numeric character reference (NCR) is a common markup construct used in SGML and SGML-derived markup languages such as HTML and XML. It consists of a short sequence of characters that, in turn, represents a single character. Since WebSgml, XM ...
, a
character entity reference allows characters to be referenced by name instead of
code point
A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
.
For example,
HTML 4
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ...
has 252 built-in character entities that do not need to be explicitly declared, while
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
has five.
XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, pr ...
has the same five as XML, but if its DTDs are explicitly used, then it has 253 (
'
being the extra entity beyond those in HTML 4).
See also
*
Declarative programming
In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow.
Many languages that ap ...
*
Object (computer science)
In software development, an object is an entity that has state, behavior, and identity. An object can model some part of reality or can be an invention of the design process whose collaborations with other such objects serve as the mechanisms ...
*
List of XML and HTML character entity references
*
XML external entity attack
Notes
{{reflist
References
* Goldfarb, Charles F. (Ed.)
ISO 8879 Review: WG8 N1855 WG8 and Liaisons, 1996.
* Goldfarb, Charles F., and Yuri Rubinsky (Ed.). ''The SGML Handbook''. Oxford University Press, 1991.
External links
Markup languages
Entity
An entity is something that Existence, exists as itself. It does not need to be of material existence. In particular, abstractions and legal fictions are usually regarded as entities. In general, there is also no presumption that an entity is Lif ...