COCOA (an acronym derived from COunt and COncordance Generation on Atlas) was an early
text file
A text file (sometimes spelled textfile; an old alternative name is flat file) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system.
In ope ...
utility and associated file format for
digital humanities
Digital humanities (DH) is an area of scholarly activity at the intersection of computing or Information technology, digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanitie ...
, then known as humanities computing. It was approximately 4000
punched card
A punched card (also punch card or punched-card) is a stiff paper-based medium used to store digital information via the presence or absence of holes in predefined positions. Developed over the 18th to 20th centuries, punched cards were widel ...
s of
FORTRAN and created in the late 1960s and early 1970s at
University College London
University College London (Trade name, branded as UCL) is a Public university, public research university in London, England. It is a Member institutions of the University of London, member institution of the Federal university, federal Uni ...
and the
Atlas Computer Laboratory in
Harwell, Oxfordshire. Functionality included word-counting and
concordance building.
Oxford Concordance Program
The
Oxford Concordance Program format was a direct descendant of COCOA developed at
Oxford University Computing Services. The
Oxford Text Archive holds items in this format.
Later developments
The COCOA file format bears at least a passing similarity to the later
markup language
A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
s such as
SGML
The Standard Generalized Markup Language (SGML; International Organization for Standardization, ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on t ...
and
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
. A noticeable difference with its successors is that COCOA tags are flat and not tree structured. In that format, every information type and value encoded by a tag should be considered true until the same tag changes its value. Members of the
Text Encoding Initiative
The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and ma ...
community maintain legacy support for COCOA, although most in-demand texts and corpora have already been migrated to more widely understood formats such as
TEI XML.
References
{{Reflist
Digital humanities
Computer file formats
History of software
Markup languages
Fortran software