Text annotation
   HOME

TheInfoList



OR:

Text annotation is the practice and the result of adding a note or gloss to a text, which may include highlights or underlining, comments, footnotes, tags, and links. Text annotations can include notes written for a reader's private purposes, as well as shared annotations written for the purposes of collaborative writing and
editing Editing is the process of selecting and preparing written, photographic, visual, audible, or cinematic material used by a person or an entity to convey a message or information. The editing process can involve correction, condensation, o ...
, commentary, or social reading and sharing. In some fields, text annotation is comparable to metadata insofar as it is added post hoc and provides information about a text without fundamentally altering that original text. Text annotations are sometimes referred to as marginalia, though some reserve this term specifically for hand-written notes made in the margins of books or manuscripts. Annotations have been found to be useful and help to develop knowledge of English literature. Annotations can be both private and socially shared, including hand-written and information technology-based annotation. Annotations are different than notetaking because annotations must be physically written or added on the actual original piece. This can be writing within the page of a book or highlighting a line, or, if the piece is digital, a comment or saved highlight or underline within the document. For information on annotation of Web content, including images and other non-textual content, see also Web annotation.


History

Text annotation may be as old as writing on media, where it was possible to produce an additional copy with a reasonable effort. It became a prominent activity around 1000 AD in Talmudic commentaries and Arabic rhetorics treaties. In the Medieval era, scribes who copied manuscripts often made marginal annotations that then circulated with the manuscripts and were thus shared with the community; sometimes annotations were copied over to new versions when such manuscripts were later recopied. With the rise of the
printing press A printing press is a mechanical device for applying pressure to an inked surface resting upon a print medium (such as paper or cloth), thereby transferring the ink. It marked a dramatic improvement on earlier printing methods in which the ...
and the relative ease of circulating and purchasing individual (rather than shared) copies of texts, the prevalence of socially shared annotations declined and text annotation became a more private activity consisting of a reader interacting with a text. Annotations made on shared copies of texts (such as library books) are sometimes seen as devaluing the text, or as an act of defacement. Thus, print technologies support the circulation of annotations primarily as formal scholarly commentary or textual footnotes or endnotes rather than marginal, handwritten comments made by private readers, though handwritten comments or annotations were common in collaborative writing or editing. Computer-based technologies have provided new opportunities for individual and socially shared text annotations that support multiple purposes, including readers' individual reading goals, learning, social
reading Reading is the process of taking in the sense or meaning of letters, symbols, etc., especially by sight or touch. For educators and researchers, reading is a multifaceted process involving such areas as word recognition, orthography (spelling ...
,
writing Writing is a medium of human communication which involves the representation of a language through a system of physically inscribed, mechanically transferred, or digitally represented symbols. Writing systems do not themselves constitute h ...
and
editing Editing is the process of selecting and preparing written, photographic, visual, audible, or cinematic material used by a person or an entity to convey a message or information. The editing process can involve correction, condensation, o ...
, and other practices. Text annotation in Information Technology (IT) systems raises technical issues of access, linkage, and storage that are generally not relevant to paper-based text annotation, and thus research and development of such systems often addresses these areas.


Functions and applications

Text annotations can serve a variety of functions for both private and public reading and communication practices. In their article "From the Margins to the Center: The Future of Annotation," scholars Joanna Wolfe and Christine Neuwirth identify four primary functions that text annotations commonly serve in the modern era, including: (1)"facilitat ngreading and later writing tasks," which includes annotations that support reading for both personal and professional purposes; (2)"eavesdrop
ing Ing, ING or ing may refer to: Art and media * '' ...ing'', a 2003 Korean film * i.n.g, a Taiwanese girl group * The Ing, a race of dark creatures in the 2004 video game '' Metroid Prime 2: Echoes'' * "Ing", the first song on The Roches' 1992 ...
on the insights of other readers," which involves sharing of annotations; (3)"provid ngfeedback to writers or promote communication with collaborators," which can include personal, professional, and education-related feedback; and (4)"call ngattention to topics and important passages," for which scholarly annotations, footnotes, and call-outs often function. Regarding the ways that annotations can support individual reading tasks, Catherine Marshall points out that the ways that readers annotate texts depends on the purpose, motivation, and context of reading. Readers may annotate to help interpret a text, to call attention to a section for future reference or reading, to support
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
and recall, to help focus attention on the text as they read, to work out a problem related to the text, or create annotations not specifically related to the text at all.


Educational applications

Educational research in text annotation has examined the role that both private and shared text annotations can play in supporting learning goals and
communication Communication (from la, communicare, meaning "to share" or "to be in relation with") is usually defined as the transmission of information. The term may also refer to the message communicated through such transmissions or the field of inqui ...
. Much educational research examines how students' private annotation of texts supports comprehension and memory; for example, research indicates that annotating texts causes more in-depth processing of information, which results in greater recall of information. Because annotations are done while reading with a writing utensil in hand, readers are supposed to be more aware of their thoughts as they read. This means that readers are, along with making notes to help them remember or better understand the content, actively engaged during the activity and are therefore more receptive to the information when annotating a text. Other areas of educational research investigate the benefits of socially shared text annotations for
collaborative learning Collaborative learning is a situation in which two or more people learn or attempt to learn something together.Dillenbourg, P. (1999). Collaborative Learning: Cognitive and Computational Approaches. Advances in Learning and Instruction Series. New ...
, both for paper-based and IT-based annotation sharing. For example, studies by Joanna Wolfe have investigated the benefits of exposure to others' annotations on student readers and writers. In a 2000 study, Wolfe found that exposing students to others' annotations influenced their perceptions of the annotators, which in turn shaped their responses to the material and their written products. In a later study, Wolfe found that viewing others' written comments on a paper text, especially pairs of annotations that present opposing responses to the text, can help students engage in the type of
critical reading Critical reading is a form of language analysis that does not take the given text at face value, but involves a deeper examination of the claims put forth as well as the supporting points and possible counterarguments. The ability to reinterpret ...
and stance-taking necessary for effective argumentative writing. While shared annotations can benefit individual readers, it is important to note that, "since the 1920s,
literacy Literacy in its broadest sense describes "particular ways of thinking about and doing reading and writing" with the purpose of understanding or expressing thoughts or ideas in written form in some specific context of use. In other words, hum ...
theory has increasingly emphasized the importance of social factors in the development of literacy." Thus, shared annotations can not only help one to better understand the content of a particular text, but may also aid in the acquirement of literacy skills. For example, a mother may leave marks inside a book to draw the attention of her child to a particular theme or concept; thanks to the development of audio annotations, parents may now leave notes for children who are just starting to read and may struggle with textual annotations. More recent research in the effects of shared text annotations has focused on the learning applications for web-based annotation systems, some of which were developed based on design recommendations from studies outlined above. For example, Ananda Gunawardena, Aaron Tan, and David Kaufer conducted a pilot study to examine whether annotating documents in Classroom Salon, a web-based annotation and social reading platform, encouraged active reading, error detection, and collaboration in a computer science course at Carnegie Mellon University. This study suggested a correlation between students' overall performance in the course and their ability to identify errors in a text that they annotated in Classroom Salon; it also found that students were likely to change their annotations in response to annotations made by others in the course. Similarly, the web-based annotation tool HyLighter was used in a first-year writing course and shown to improve the development of students' mental models of texts, including supporting reading comprehension, critical thinking, and the ability to develop a thesis. The collaboration with peers and experts around a shared text improved these skills and brought the communities' understanding closer together. A meta-analysis of empirical studies into the higher-education uses of social annotation (SA) tools indicates such tools have been tested in several courses, among them
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ide ...
, sport psychology, and hypermedia. Studies have indicated that social annotation functions, including commenting, information sharing, and highlighting, can support instruction designed to foster collaborative learning and communication, as well as reading comprehension, metacognition, and critical analysis. Several studies indicated that students enjoyed using social annotation tools, and that it improved motivation in the course. " Multi Sensory" annotations have also been found to help students retain not only information in the classroom, but this can also help those who are trying to learn a new language. Images can be placed next to or linked to words for people to get a better understand of what that word means by looking at it. The same can be done with an audio clip of how that word is pronounced and also its meaning. Of course this is easier done using technology and in order to be specifically an annotation it must be embedded within the referenced document. However in physical copies of text a picture can be drawn next to a word and still be a sensory annotation. This form of annotation furthers comprehension, specifically in the classroom because it requires more of students' brains to retain the information being given.


Writing and text-centered collaboration

Text annotations have long been used in writing and revision processes as a way for reviewers to suggest changes and communicate about a text. In book publishing, for example, the collaboration of authors and editors to develop and revise a manuscript frequently involves exchanges of both in-line revisions or notes as well as marginal annotations. Similarly, copyeditors often make marginal annotations or notes that explain or suggest revisions or are directed at the author as questions or suggestions (commonly called "queries"). Asynchronous collaborative writing and document development often depend on text annotations as a way not only to suggest revisions but also to exchange ideas during document development or to facilitate group decision making, though such processes are often complicated by the use of different communication technologies (such as phone calls or emails as well as document sharing) for distinct tasks. Text annotations can also function to allow group or community members to communicate about a shared text, such as a doctor annotating a patient's chart. Much research into the functionality and design of collaborative IT-based writing systems, which often support text annotation, has occurred in the area of
computer-supported cooperative work Computer-supported cooperative work (CSCW) is the study of how people utilize technology collaboratively, often towards a shared goal. CSCW addresses how computer systems can support collaborative activity and coordination. More specifically, the ...
.


Linguistic annotation

In corpus linguistics, digital philology and natural language processing, annotations are used to explicate linguistic, textual or other features of a text (or other digital representations of natural language). In
linguistics Linguistics is the science, scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure ...
, annotations include comments and metadata; non-transcriptional annotations are also non-linguistic. In these disciplines, annotations are the basis for quantitative research,
empirical studies Empirical research is research using empirical evidence. It is also a way of gaining knowledge by means of direct and indirect observation or experience. Empiricism values some research more than other kinds. Empirical evidence (the record of one ...
and the application of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. Unlike annotations in the above-mentioned uses (that appear very sparsely), linguistic annotation usually requires that every element (token) within a text carries one or multiple annotations, and that complex relations between different annotations exist. A number of specialized formats (and tools) for this purpose exist, the following illustrates an annotation with as used in the
Universal Dependencies Universal Dependencies, frequently abbreviated as UD, is an international cooperative project to create treebanks of the world's languages. These treebanks are openly accessible and available. Core applications are automated text processing in ...
project. For clarity, the
tab-separated values A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure, e.g., a database table or spreadsheet data, and a way of exchanging information between databases. Each record in the table is one line of the text ...
normally used have been replaced by an HTML table. A visualization of the example is given in Fig. 2. In addition to word-level annotations, the word (and the sentence, etc.) in this format can carry metadata. Various other annotation formats do exist, often coupled with certain pieces of software for their creation,
processing Processing is a free graphical library and integrated development environment (IDE) built for the electronic arts, new media art, and visual design communities with the purpose of teaching non-programmers the fundamentals of computer programming ...
or querying, see Ide et al. (2017) for an overview. The Linguistic Annotation Wiki describes tools and formats for creating and managing linguistic annotations. Selected problems and applications are also discussed under
Overlapping markup In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non- hierarchical manner. A document with overlapping markup cannot be represented as a tree. This is also known as concur ...
and Web annotation. Aside from tab-separated values and other text formats, formats for linguistic annotations are often based on markup languages such as
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
(and formerly, SGML), more complex annotations may also employ
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
-based data models and formats such as
JSON-LD JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data ...
, e.g., in accordance with the Web Annotation standard. Linguistic annotation comes with an independent research tradition and its own terminology: The target of an annotation is usually referred to as a 'markable', the body of the annotation as 'annotation', the relation between annotation and markable is usually expressed in the annotation format (e.g., by having annotations and text side-by side), so that explicit anchors are not necessary.


Structure and design

Research in the design and development of annotation systems uses specific terminology to refer to distinct structural components of annotations and also distinguishes among options for digital annotation displays.


Annotation structure

The structural components of any annotation can be roughly divided into three primary elements: a ''body'', an ''anchor'', and a ''marker''. The body of an annotation includes reader-generated symbols and text, such as handwritten commentary or stars in the margin. The anchor is what indicates the extent of the original text to which the body of the annotation refers; it may include circles around sections, brackets, highlights, underlines, and so on. Annotations may be anchored to very broad stretches of text (such as an entire document) or very narrow sections (such as a specific letter, word, or phrase). The marker is the visual appearance of the anchor, such as whether it is a grey underline or a yellow highlight. An annotation that has a body (such as a comment in the margin) but no specific anchor has no marker.


Annotation display types

IT-based annotation systems utilize a variety of display options for annotations, including: * Footnote interfaces that display annotations below the corresponding text * Aligned annotations that display comments and notes vertically in the text margins, sometimes in multiple columns or as a "sidebar" layer * Interlinear annotations that attach annotations directly into a text * Sticky note interfaces, where annotations appear in popup dialogs over the source text * Voice annotations, in which reviewers record annotations and embed them within a document * Pen or digital-ink based interfaces that allow writing directly on a document or screen Annotation interfaces may also allow highlighting or underlining, as well as threaded discussions. Sharing and communicating through annotations anchored to specific documents is sometimes referred to as ''anchored discussion''.


IT-based text annotation systems

IT-based annotation systems include
standalone Standalone or Stand-alone may refer to: *Stand-alone DSL, a digital subscriber line without analog telephone service; also known as ''naked DSL'' *Stand-alone expansion pack, an expansion pack which does not require the original game in order to us ...
and client-server systems. In the 1980s and 1990s, a number of such systems were built in the context of
libraries A library is a collection of Document, materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or electronic media, digital access (soft copies) materials, and may be a ...
, patent offices, and legal text processing. Their design led researchers to produce taxonomies of annotation forms. Text annotation research has taken place at several institutions, including Xerox research centers in
Palo Alto Palo Alto (; Spanish for "tall stick") is a charter city in the northwestern corner of Santa Clara County, California, United States, in the San Francisco Bay Area, named after a coastal redwood tree known as El Palo Alto. The city was es ...
and Grenoble (France), the Hitachi Central Research Lab (in particular for annotation of patents), and in relation with the construction of the new French National Library between 1989 and 1995 at the Institut de Recherche en Informatique de Toulouse and in the company AIS (Advanced Innovation Systems). Annotation functionality has been present in text processing software for many years through inline notes displayed as pop-ups, footnotes, and endnotes; however, it is only recently that functionality for displaying annotations as marginalia has appeared in programs such as
OpenOffice.org OpenOffice.org (OOo), commonly known as OpenOffice, is a discontinued open-source office suite. Active successor projects include LibreOffice (the most actively developed), Apache OpenOffice, Collabora Online (enterprise ready LibreOffice) a ...
/
LibreOffice LibreOffice () is a free and open-source office productivity software suite, a project of The Document Foundation (TDF). It was forked in 2010 from OpenOffice.org, an open-sourced version of the earlier StarOffice. The LibreOffice suite co ...
Writer and
Microsoft Word Microsoft Word is a word processor, word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other pla ...
. Personal or standalone annotation include word processing software that supports embedded or anchored text annotations as well as Adobe Acrobat, which in addition to commenting allows highlights, stamps, and other types of markup.


Web-based text annotation systems

Tim Berners-Lee had already implemented the concept of directly editing web documents in 1990 in
WorldWideWeb WorldWideWeb (later renamed Nexus to avoid confusion between the software and the World Wide Web) is the first web browser and web page editor. It was discontinued in 1994. It was the first WYSIWYG HTML editor. The source code was released in ...
, the first web browser, but later ported versions removed this collaborative ability. An early version of NCSA Mosaic in 1993 also included a collaborative annotation capability, though it was quickly removed. Web Distributed Authoring and Versioning,
WebDAV WebDAV (Web Distributed Authoring and Versioning) is a set of extensions to the Hypertext Transfer Protocol (HTTP), which allows user agents to collaboratively author contents ''directly'' in an HTTP web server by providing facilities for con ...
, was then reintroduced as an extension. A different approach to distributed authoring consists in first gathering many annotations from a wide public, and then integrate them all in order to produce a further version of a document. This approach was pioneered by Stet, the system put in place to gather comments on drafts of version 3 of the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general ...
. This system arose after a specific requirement, which it served egregiously, but was not so easily configurable as to be convenient for annotating any other document on the web. The co-ment system uses annotation interface concepts similar to Stet's, but it is based on an entirely new implementation, using Django/
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
on the server side and various
AJAX Ajax may refer to: Greek mythology and tragedy * Ajax the Great, a Greek mythological hero, son of King Telamon and Periboea * Ajax the Lesser, a Greek mythological hero, son of Oileus, the king of Locris * ''Ajax'' (play), by the ancient Gree ...
libraries such as
JQuery jQuery is a JavaScript library designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animation, and Ajax. It is free, open-source software using the permissive MIT License. As of Aug 2022, jQuery is u ...
on the client side. Both Stet and co-ment are licensed under the GNU
Affero General Public License The Affero General Public License (Affero GPL and informally Affero License) is a free software license. The first version of the Affero General Public License (AGPLv1), was published by Affero, Inc. in March 2002, and based on the GNU General P ...
. Since 2011, the non-profit Hypothes Is Project has offered the free, open web annotation service Hypothes.is. The service features annotation via a Chrome extension, bookmarklet or proxy server, as well as integration into a LMS or
CMS CMS may refer to: Computing * Call management system * CMS-2 (programming language), used by the United States Navy * Code Morphing Software, a technology used by Transmeta * Collection management system for a museum collection * Color manag ...
. Both webpages and PDFs can be annotated. Other web-based text annotation systems are collaborative software for distributed text editing and versioning, which also feature annotation and commenting interfaces. Specialized Web-based text annotations exist in the context of scientific publication, either for refereeing or post-publication. The on-line journal PLoS ONE, published by the
Public Library of Science PLOS (for Public Library of Science; PLoS until 2012 ) is a nonprofit publisher of open-access journals in science, technology, and medicine and other scientific literature, under an open-content license. It was founded in 2000 and laun ...
, has developed its own Web-based system where scientists and the public can comment on published articles. The annotations are displayed as pop-ups with an anchor in the text.


See also

*
Annotation An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For anno ...
* Web annotation *
Gloss (annotation) A gloss is a brief notation, especially a marginal one or an interlinear one, of the meaning of a word or wording in a text. It may be in the language of the text or in the reader's language if that is different. A collection of glosses is a ''g ...
*
Interlinear gloss In linguistics and pedagogy, an interlinear gloss is a gloss (series of brief explanations, such as definitions or pronunciations) placed between lines, such as between a line of original text and its translation into another language. When gloss ...
*
Footnote A note is a string of text placed at the bottom of a page in a book or document or at the end of a chapter, volume, or the whole text. The note can provide an author's comments on the main text or citations of a reference work in support of th ...
* PDF annotation * Marginalia *
Social bookmarking Social bookmarking is an online service which allows users to add, annotate, edit, and share bookmarks of web documents. Many online bookmark management services have launched since 1996; Delicious, founded in 2003, popularized the terms "social ...
* Comment (computer programming)


References


External links


Effects of annotations on student readers and writers

Annotations and the Collaborative Digital Library: Effects of an Aligned Annotation Interface on Student Argumentation and Reading Strategies

From the Margins to the Center: ''The Future of Annotation''

Bringing Social Media to the Writing Classroom: Classroom Salon
which discusses how social media can facilitate collaboration in writing classrooms
Asynchronous Collaborative Writing through Annotations
which describes how the benefits of physical annotations can be brought into a digital environment {{Book structure Book design Reference