HOME

TheInfoList



OR:

COCOA (an acronym derived from COunt and COncordance Generation on Atlas) was an early
text file A text file (sometimes spelled textfile; an old alternative name is flat file) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In ope ...
utility and associated file format for
digital humanities Digital humanities (DH) is an area of scholarly activity at the intersection of computing or Information technology, digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanitie ...
, then known as humanities computing. It was approximately 4000
punched card A punched card (also punch card or punched-card) is a stiff paper-based medium used to store digital information via the presence or absence of holes in predefined positions. Developed over the 18th to 20th centuries, punched cards were widel ...
s of FORTRAN and created in the late 1960s and early 1970s at
University College London University College London (Trade name, branded as UCL) is a Public university, public research university in London, England. It is a Member institutions of the University of London, member institution of the Federal university, federal Uni ...
and the Atlas Computer Laboratory in Harwell, Oxfordshire. Functionality included word-counting and concordance building.


Oxford Concordance Program

The Oxford Concordance Program format was a direct descendant of COCOA developed at Oxford University Computing Services. The Oxford Text Archive holds items in this format.


Later developments

The COCOA file format bears at least a passing similarity to the later
markup language A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
s such as
SGML The Standard Generalized Markup Language (SGML; International Organization for Standardization, ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on t ...
and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
. A noticeable difference with its successors is that COCOA tags are flat and not tree structured. In that format, every information type and value encoded by a tag should be considered true until the same tag changes its value. Members of the
Text Encoding Initiative The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and ma ...
community maintain legacy support for COCOA, although most in-demand texts and corpora have already been migrated to more widely understood formats such as TEI XML.


References

{{Reflist Digital humanities Computer file formats History of software Markup languages Fortran software