HOME

TheInfoList



OR:

An annotation is extra information associated with a particular point in a
document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as fictional, content. The word originates from the Latin ', which denotes ...
or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For annotations of different digital media, see
web annotation Web annotation can refer to online annotations of web resources such as web pages or parts of them, or a set of World Wide Web Consortium, W3C W3C recommendation, standards developed for this purpose. The term can also refer to the creations of an ...
and
text annotation Text annotation is the practice and the result of adding a note or gloss to a text, which may include highlights or underlining, comments, footnotes, tags, and links. Text annotations can include notes written for a reader's private purposes, as ...
.


Literature, grammar and educational purposes


Practising visually

Annotation Practices are highlighting a phrase or sentence and including a comment, circling a word that needs defining, posing a question when something is not fully understood and writing a short summary of a key section. It also invites students to "(re)construct a history through material engagement and exciting DIY (Do-It-Yourself) annotation practices." Annotation practices that are available today offer a remarkable set of tools for students to begin to work, and in a more collaborative, connected way than has been previously possible.


Text and film annotation

Text and Film Annotation is a technique that involves using comments, text within a film. Analyzing videos is an undertaking that is never entirely free of preconceived notions, and the first step for researchers is to find their bearings within the field of possible research approaches and thus reflect on their own basic assumptions. Annotations can take part within the video, and can be used when the data video is recorded. It is being used as a tool in text and film to write one's thoughts and emotion into the markings. In any number of steps of analysis, it can also be supplemented with more annotations. Anthropologists Clifford Geertz calls it a "thick description." This can give a sense of how useful annotation is, especially by adding a description of how it can be implemented in film.


Medieval marginalia

Marginalia refers to writing or decoration in the margins of a manuscript. Medieval marginalia is so well known that amusing or disconcerting instances of it are fodder for viral aggregators such as Buzzfeed and Brainpickings, and the fascination with other readers’ reading is manifest in sites such as Melville's Marginalia Online or Harvard's online exhibit of marginalia from six personal libraries. It can also be a part of other websites such as Pinterest, or even meme generators and GIF tools.


Textual scholarship

Textual scholarship Textual scholarship (or textual studies) is an umbrella term for disciplines that deal with describing, transcribing, editing or annotating text (literary theory), texts and physical documents. Overview Textual research is mainly historically orie ...
is a discipline that often uses the technique of annotation to describe or add additional historical context to texts and physical documents to make it easier to understand.


Student uses

Students often highlight passages in books in order to actively engage with the text. Students can use annotations to refer back to key phrases easily, or add
marginalia Marginalia (or apostils) are marks made in the margin (typography), margins of a book or other document. They may be scribbles, comments, gloss (annotation), glosses (annotations), critiques, doodles, drolleries, or illuminated manuscript, ...
to aid studying and finding connections between the text and prior knowledge or running themes. Annotated bibliographies add commentary on the relevance or quality of each source, in addition to the usual bibliographic information that merely identifies the source. Students use Annotation not only for academic purposes, but interpreting their own thoughts, feelings, and emotions. Sites such as Scalar and Omeka are sites that students use. There are multiple genres with Annotation such as math, film, linguists, and literary theory which students find it most helpful to use. Most students reported the annotation process as helpful for improving overall writing ability, grammar, and academic vocabulary knowledge.


Mathematical expression annotation

Mathematical expression In mathematics, an expression is a written arrangement of symbols following the context-dependent, syntactic conventions of mathematical notation. Symbols can denote numbers, variables, operations, and functions. Other symbols include punct ...
s (symbols and formulae) can be annotated with their natural language meaning. This is essential for disambiguation, since symbols may have different meanings (e.g., "E" can be "energy" or "expectation value", etc.). The annotation process can be facilitated and accelerated through recommendation, e.g., using the "AnnoMathTeX" system that is hosted by Wikimedia.


Learning and instruction

From a cognitive perspective, annotation has an important role in learning and instruction. As part of guided noticing it involves highlighting, naming or labelling and commenting aspects of visual representations to help focus learners' attention on specific visual aspects. In other words, it means the assignment of typological representations (culturally meaningful categories), to topological representations (e.g. images). This is especially important when experts, such as medical doctors, interpret visualizations in detail and explain their interpretations to others, for example by means of digital technology. Here, annotation can be a way to establish common ground between interactants with different levels of knowledge. The value of annotation has been empirically confirmed, for example, in a study which shows that in computer-based teleconsultations the integration of image annotation and speech leads to significantly improved knowledge exchange compared with the use of images and speech without annotation.


On YouTube

Annotations were removed on January 15, 2019, from
YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
after around a decade of service. They had allowed users to provide information that popped up during videos, but YouTube indicated they did not work well on small mobile screens, and were being abused.


Software and engineering


Text documents

Markup language A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
s like
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
and
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
annotate text in a way that is syntactically distinguishable from that text. They can be used to add information about the desired visual presentation, or machine-readable semantic information, as in the
semantic web The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding o ...
.


Tabular data

This includes CSV and XLS. The process of assigning semantic annotations to tabular data is referred to as semantic labelling. Semantic Labelling is the process of assigning annotations from
ontologies In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
to tabular data. This process is also referred to as semantic annotation. Semantic Labelling is often done in a (semi-)automatic fashion. Semantic Labelling techniques work on entity columns, numeric columns, coordinates, and more.


Semantic labelling techniques

There are several semantic labelling types which utilises machine learning techniques. These techniques can be categorised following the work of Flach as follows: geometric (using lines and planes, such as
Support-vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning, supervised Maximum-margin hyperplane, max-margin models with associated learning algorithms that analyze data for Statistical classification ...
,
Linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
), probabilistic (e.g., Conditional random field), logical (e.g.,
Decision tree learning Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of obser ...
), and Non-ML techniques (e.g., balancing coverage and specificity). Note that the geometric, probabilistic, and logical machine learning models are not mutually exclusive.


= Geometric techniques

= Pham et al. use Jaccard index and TF-IDF similarity for textual data and
Kolmogorov–Smirnov test In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric statistics, nonparametric test of the equality of continuous (or discontinuous, see #Discrete and mixed null distribution, Section 2.2), one-dimensional ...
for the numeric ones. Alobaid and Corcho use fuzzy clustering (c-means) to label numeric columns.


= Probabilistic techniques

= Limaye et al. uses TF-IDF similarity and graphical models. They also use
support-vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning, supervised Maximum-margin hyperplane, max-margin models with associated learning algorithms that analyze data for Statistical classification ...
to compute the weights. Venetis et al. construct an isA database which consists of the pairs (instance, class) and then compute maximum likelihood using these pairs. Alobaid and Corcho approximated the q-q plot for predicting the properties of numeric columns.


= Logical techniques

= Syed et al. built Wikitology, which is "a hybrid knowledge base of structured and unstructured information extracted from Wikipedia augmented by RDF data from DBpedia and other Linked Data resources." For the Wikitology index, they use
PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. Accordin ...
for
Entity linking In natural language processing, Entity Linking, also referred to as named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD), named-entity normalization (NEN), or Concept Recognition, is the task of assigning a unique ...
, which is one of the tasks often used in semantic labelling. Since they were not able to query Google for all Wikipedia articles to get the
PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. Accordin ...
, they used
Decision tree A decision tree is a decision support system, decision support recursive partitioning structure that uses a Tree (graph theory), tree-like Causal model, model of decisions and their possible consequences, including probability, chance event ou ...
to approximate it.


= Non-ML techniques

= Alobaid and Corcho presented an approach to annotate entity columns. The technique starts by annotating the cells in the entity column with the entities from the reference knowledge graph (e.g.,
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
). The classes are then gathered and each one of them is scored based on several formulas they presented taking into account the frequency of each class and their depth according to the subClass hierarchy.


Semantic labelling common tasks

Here are some of the common semantic labelling tasks presented in the literature:


= Entity linking and disambiguation

= This is the most common task in semantic labelling. Given a text of a cell and a data source, the approach predicts the entity and link it to the one identified in the given data source. For example, if the input to the approach were the text "Richard Feynman" and a URL to the SPARQL endpoint of DBpedia, the approach would return
http://dbpedia.org/resource/Richard_Feynman
, which is the entity from DBpedia. Some approaches use exact match. while others use similarity metrics such as Cosine similarity


= Subject column identification

= The subject column of a table is the column that contain the main subjects/entities in the table. Some approaches expects the subject column as an input while others predict the subject column such as TableMiner+.


= Column data-type detection

= Columns types are divided differently by different approaches. Some divide them into strings/text and numbers while others divide them further (e.g., Number Typology, Date, coordinates).


= Relation prediction

= The relation between
Madrid Madrid ( ; ) is the capital and List of largest cities in Spain, most populous municipality of Spain. It has almost 3.5 million inhabitants and a Madrid metropolitan area, metropolitan area population of approximately 7 million. It i ...
and
Spain Spain, or the Kingdom of Spain, is a country in Southern Europe, Southern and Western Europe with territories in North Africa. Featuring the Punta de Tarifa, southernmost point of continental Europe, it is the largest country in Southern Eur ...
is "capitalOf". Such relations can easily be found in ontologies, such as
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
. Venetis et al. use TextRunner to extract the relation between two columns. Syed et al. use the relation between the entities of the two columns and the most frequent relation is selected.


Gold standards

T2D is the most common gold standard for semantic labelling. Two versions exists of T2D: T2Dv1 (sometimes are referred to T2D as well) and T2Dv2. Another known benchmarks are published with the SemTab Challenge.


Source control

The "annotate" function (also known as "blame" or "praise") used in source control systems such as Git, Team Foundation Server and
Subversion Subversion () refers to a process by which the values and principles of a system in place are contradicted or reversed in an attempt to sabotage the established social order and its structures of Power (philosophy), power, authority, tradition, h ...
determines who committed changes to the source code into the repository. This outputs a copy of the source code where each line is annotated with the name of the last contributor to edit that line (and possibly a revision number). This can help establish blame in the event a change caused a malfunction, or identify the author of brilliant code.


Java annotations

A special case is the
Java programming language Java is a high-level, general-purpose, memory-safe, object-oriented programming language. It is intended to let programmers ''write once, run anywhere'' ( WORA), meaning that compiled Java code can run on all platforms that support Jav ...
, where annotations can be used as a special form of syntactic
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
in the source code. Classes, methods, variables, parameters and packages may be annotated. The annotations can be embedded in class files generated by the compiler and may be retained by the
Java virtual machine A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descr ...
and thus influence the run-time behaviour of an application. It is possible to create meta-annotations out of the existing ones in Java.


Image annotation

Automatic image annotation is used to classify images for image retrieval systems.


Computational biology

Since the 1980s,
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
and
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
have created the need for DNA annotation. DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanation or commentary. Once a genome is sequenced, it needs to be annotated to make sense of it.


Digital imaging

In the
digital imaging Digital imaging or digital image acquisition is the creation of a digital representation of the visual characteristics of an object, such as a physical scene or the interior structure of an object. The term is often assumed to imply or include ...
community the term annotation is commonly used for visible metadata superimposed on an
image An image or picture is a visual representation. An image can be Two-dimensional space, two-dimensional, such as a drawing, painting, or photograph, or Three-dimensional space, three-dimensional, such as a carving or sculpture. Images may be di ...
without changing the underlying master image, such as sticky notes, virtual laser pointers, circles, arrows, and black-outs (cf.
redaction Redaction or sanitization is the process of removing sensitive information from a document so that it may be distributed to a broader audience. It is intended to allow the selective disclosure of information. Typically, the result is a document ...
). In the
medical imaging Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to revea ...
community, an annotation is often referred to as a region of interest and is encoded in
DICOM Digital Imaging and Communications in Medicine (DICOM) is a technical standard for the digital storage and Medical image sharing, transmission of medical images and related information. It includes a file format definition, which specifies the str ...
format.


Other uses


Law

In the United States, legal publishers such as Thomson West and
Lexis Nexis LexisNexis is an American data analytics company headquartered in New York, New York. Its products are various databases that are accessed through online portals, including portals for computer-assisted legal research (CALR), newspaper searc ...
publish annotated versions of
statutes A statute is a law or formal written enactment of a legislature. Statutes typically declare, command or prohibit something. Statutes are distinguished from court law and unwritten law (also known as common law) in that they are the expressed wil ...
, providing information about court cases that have interpreted the statutes. Both the federal
United States Code The United States Code (formally The Code of Laws of the United States of America) is the official Codification (law), codification of the general and permanent Law of the United States#Federal law, federal statutes of the United States. It ...
and state statutes are subject to interpretation by the
courts A court is an institution, often a government entity, with the authority to adjudicate legal disputes between parties and administer justice in civil, criminal, and administrative matters in accordance with the rule of law. Courts gene ...
, and the annotated statutes are valuable tools in
legal research Legal research is "the process of identifying and retrieving information necessary to support legal decision-making. In its broadest sense, legal research includes each step of a course of action that begins with an analysis of the facts of a prob ...
.


Linguistics

One purpose of annotation is to transform the data into a form suitable for computer-aided analysis. Prior to annotation, an annotation scheme is defined that typically consists of tags. During tagging, transcriptionists manually add tags into transcripts where required linguistical features are identified in an annotation editor. The annotation scheme ensures that the tags are added consistently across the data set and allows for verification of previously tagged data. Aside from tags, more complex forms of linguistic annotation include the annotation of phrases and relations, e.g., in treebanks. Many different forms of linguistic annotation have been developed, as well as different formats and tools for creating and managing linguistic annotations, as described, for example, in the Linguistic Annotation Wiki.


See also

*
Abstract (summary) An abstract is a brief summary of a research article, thesis, review, conference proceeding, or any in-depth analysis of a particular subject and is often used to help the reader quickly ascertain the paper's purpose. When used, an abstract alwa ...
* Automatic image annotation *
Coding (social sciences) In the social sciences, coding is an analytical process in which data, in both quantitative form (such as questionnaires results) or qualitative form (such as interview transcripts) are categorized to facilitate analysis. One purpose of coding ...
* Drama annotation * Comment (various) *
Footnote In publishing, a note is a brief text in which the author comments on the subject and themes of the book and names supporting citations. In the editorial production of books and documents, typographically, a note is usually several lines of tex ...
* Hyperkino *
Index (publishing) An index (: usually indexes, more rarely indices) is a list of words or phrases ('headings') and associated pointers ('locators') to where useful material relating to that heading can be found in a document or collection of documents. Examples ar ...
*
Marginalia Marginalia (or apostils) are marks made in the margin (typography), margins of a book or other document. They may be scribbles, comments, gloss (annotation), glosses (annotations), critiques, doodles, drolleries, or illuminated manuscript, ...
*
Metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
*
Nota Bene ( ; plural: ) is the Latin language, Latin phrase meaning ''note well''. In manuscripts, ''nota bene'' is abbreviated in upper-case as NB and N.B., and in lower-case as n.b. and nb; the editorial usages of ''nota bene'' and ''notate bene'' fi ...
*
Obelus An obelus (plural: obeluses or obeli) is a term in codicology and latterly in typography that refers to a historical annotation mark which has resolved to three modern meanings: * Division sign * Dagger * Commercial minus sign (limited g ...
, a symbol used on ancient manuscripts to mark passages that were suspected of being corrupted or spurious; the practice of adding such marginal notes became known as
obelism Obelism is the practice of annotating manuscripts with marks set in the margins. Modern obelisms are used by editors when proofreading a manuscript or typescript. Examples are "stet" (which is Latin for "Let it stand", used in this context to m ...
. * PDF annotation *
Subject indexing Subject indexing is the act of describing or classifying a document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as ...
*
Semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
*
Tag (metadata) In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again ...
*
Text annotation Text annotation is the practice and the result of adding a note or gloss to a text, which may include highlights or underlining, comments, footnotes, tags, and links. Text annotations can include notes written for a reader's private purposes, as ...
*
Web annotation Web annotation can refer to online annotations of web resources such as web pages or parts of them, or a set of World Wide Web Consortium, W3C W3C recommendation, standards developed for this purpose. The term can also refer to the creations of an ...
* XPS annotation


References

{{Authority control Writing Reference