HOME

TheInfoList



OR:

The Text Creation Partnership (TCP) is a not-for-profit organization based in the library of the
University of Michigan , mottoeng = "Arts, Knowledge, Truth" , former_names = Catholepistemiad, or University of Michigania (1817–1821) , budget = $10.3 billion (2021) , endowment = $17 billion (2021)As o ...
. Its purpose is to produce large-scale full-text electronic resources (especially in the humanities) on behalf of both member institutions (particularly academic libraries) and scholarly publishers, under an arrangement calculated to serve the needs of both, and in so doing to demonstrate the value of a business model that sees corporate and non-profit information-providers as potentially amicable collaborators rather than as antagonistic vendors and customers respectively.


Projects

TCP has sponsored four text-creation projects to date. The first and the largest is "EEBO-TCP (Phase I)" (2001–2009), an effort to produce structurally marked-up full-text transcriptions of 25,000+ of the roughly 125,000 books to be found either in the Pollard and Redgrave and Wing
short-title catalogue A short-title catalogue (or catalog) is a bibliographical resource that lists printed items in an abbreviated fashion, recording the most important words of their titles. The term is commonly encountered in the context of early modern books, which ...
s of early English printed books, or among the
Thomason Tracts The Thomason Collection of Civil War Tracts consists of more than 22,000 pamphlets, broadsides, manuscripts, books, and news sheets, most of which were printed and distributed in London from 1640 to 1661. The collection represents a major primary s ...
, that is, from among nearly all books, pamphlets, and broadsides published in English or in England before 1700. The books were selected and transcribed from the digital scans produced by
ProQuest ProQuest LLC is an Ann Arbor, Michigan-based global information-content and technology company, founded in 1938 as University Microfilms by Eugene B. Power. ProQuest is known for its applications and information services for libraries, provid ...
Information and Learning, and distributed by them as a web-based product under the name "Early English Books Online" (EEBO). The scans from which the texts were transcribed were themselves made from the microfilm copies made over the years by ProQuest and its antecedent companies, including the original University Microfilms, Inc. EEBO-TCP Phase I concluded at the end of 2009, having transcribed about 25,300 titles, and immediately moved into EEBO-TCP Phase II (2009–), a sequel project dedicated to converting all the remaining unique English-language monographs (roughly 45,000 additional titles). The third TCP project was Evans-TCP (2003–2007, with some ongoing work through 2010), an effort to transcribe 6,000 of the 36,000 pre-1800 titles listed in Charles Evans' ''American Bibliography,'' and distributed, again as page images scanned from microfilm copies, by
Readex Readex, a division of NewsBank, publishes collections of primary source research materials. History In 1950, publisher Albert Boni, co-founder of the Modern Library, formed the Readex Microprint Corporation in New York City. Some of the comp ...
, a division of NewsBank, Inc. under the name " Archive of Americana" ("Early American Imprints, series I: Evans, 1639–1800"). Evans-TCP has produced e-texts of nearly 5,000 books. The final TCP project was ECCO-TCP (2005–2010, with some work ongoing), an effort to transcribe 10,000 eighteenth-century books from among the 136,000 titles available in Thomson-Gale's web-based resource, "Eighteenth-Century Collections Online" (ECCO). ECCO-TCP ran out of funding in 2010 after transcribing about 3,000 (and editing about 2,400) titles.


Project commonalities

All four TCP text projects are very similar. In each case: # The TCP produces text from commercial image files that have in turn been created from microfilm copies of early books. # The commercial image providers receive what is in effect a full-text index to their image product for much less than it would cost to produce themselves: value added to their product. # The partner libraries actually own, rather than simply license, the resultant texts, and are free (subject to some conditions) to mount the texts themselves in whatever system they like, or use the texts internally as a tool of scholarship and teaching. # The texts are created according to library-determined standards, uniform across multiple data-sets and potentially cross-searchable. # Because they are created collaboratively, the texts are relatively inexpensive (on a per-book basis) and become more so with each library that joins the partnership. # The texts will eventually be made freely accessible to the public at large. # The selection of texts to convert, though differing from project to project, in each case follows similar principles: variety, significance, representative quality, avoidance of duplication; specific requests from faculty or scholarly initiatives at member institutions are also generally honored. # TCP has been hitherto primarily interested in creating texts, not in creating a "product"; though texts from all three projects are or will be mounted on servers at the University of Michigan library, the Michigan site is not the official TCP site: any partner library with adequate resources and safeguards may do the same. EEBO-TCP texts, for example, are served by Michigan, ProQuest, the Oxford University Digital Library, and the University of Chicago.


Organization

The TCP is overseen by a Board of Directors, drawn chiefly from senior library administrators at partner institutions, representatives of the corporate partners, and the
Council on Library and Information Resources The Council on Library and Information Resources (CLIR) is an American independent, nonprofit organization. It works with libraries, cultural institutions, and higher learning communities on developing strategies to improve research, teaching, an ...
(CLIR). The Board is assisted in matters of selection and scholarship by an academic advisory group that includes faculty in the fields of early modern English and American studies. The TCP has informal ties to a number of University-based scholarly text projects, especially in helping to provide them with source texts with which to work. Institutions represented include
Northwestern University Northwestern University is a private research university in Evanston, Illinois. Founded in 1851, Northwestern is the oldest chartered university in Illinois and is ranked among the most prestigious academic institutions in the world. Charte ...
,
University of Oxford , mottoeng = The Lord is my light , established = , endowment = £6.1 billion (including colleges) (2019) , budget = £2.145 billion (2019–20) , chancellor ...
,
Washington University in St. Louis Washington University in St. Louis (WashU or WUSTL) is a private research university with its main campus in St. Louis County, and Clayton, Missouri. Founded in 1853, the university is named after George Washington. Washington University is r ...
,
University of Sydney The University of Sydney (USYD), also known as Sydney University, or informally Sydney Uni, is a public research university located in Sydney, Australia. Founded in 1850, it is the oldest university in Australia and is one of the country's si ...
,
University of Toronto The University of Toronto (UToronto or U of T) is a public research university in Toronto, Ontario, Canada, located on the grounds that surround Queen's Park. It was founded by royal charter in 1827 as King's College, the first institution ...
, and
University of Victoria The University of Victoria (UVic or Victoria) is a public research university located in the municipalities of Oak Bay and Saanich, British Columbia, Canada. The university traces its roots to Victoria College, the first post-secondary instit ...
. TCP has also worked with students by sponsoring an Undergraduate Essay Contest every year, convening task forces on the uses of TCP texts in pedagogy, and appealing to scholars and students for ideas on selection and use. Text production is managed through the University of Michigan's
Digital Library Production Service The University of Michigan Library is the academic library system of the University of Michigan. The university's 38 constituent and affiliated libraries together make it the second largest research library by number of volumes in the United State ...
(DLPS), with its extensive experience in the production of SGML/XML-encoded electronic texts. DLPS is assisted by University of Oxford's Bodleian Digital Libraries Systems & Services (BDLSS), including the late
Sebastian Rahtz Sebastian Patrick Quintus Rahtz (13 February 1955 – 15 March 2016) (SPQR) was a British digital humanities information professional. Life Born in 1955 to Somerset-focused archaeologist Philip Rahtz, Sebastian trained in archaeology, befor ...
. Small part-time production operations have also been started within two other libraries: the Centre for Reformation and Renaissance Studies in Pratt Library (Victoria University in the University of Toronto), specializing in Latin books; and the National Library of Wales (Llyfrgell Genedlaethol Cymru) in Aberystwyth, specializing in Welsh books.


Standards

All four TCP text projects are produced in the same way and to the same standards, which are documented, at least in part, on the TCP web site. # Accuracy. The TCP strives to produce texts that are as accurately transcribed as possible, with a specified overall accuracy rate of 99.995% or better (i.e. one error or fewer per 20,000 characters). # Keying. Given the nature of the material, the only method found to deliver such accuracy economically has been to have the books keyed by data conversion firms under contract. # Quality control. Accuracy of transcription and aptness of markup are assessed in all cases by a group of library-based proofers and reviewers managed by the University of Michigan DLPS. # Encoding. All resultant text files are marked up in valid SGML or XML (SGML is archived, XML is exported) conforming to a proprietary "Document Type Description" (DTD) derived from the P3/P4 version of the
Text Encoding Initiative The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and main ...
(TEI) standard. # Purposeful markup. Compared to the full TEI, the TCP DTD is very simple and intended to capture only the features most useful for intelligible display, intelligent navigation, and productive searching. The TCP practice is to capture, so far as feasible, the overall hierarchical structure of each book (parts, sections, chapters, etc.); the features that tend to mark the beginnings and ends of divisions (headings, explicits, salutations, valedictions, datelines, bylines, epigraphs, etc.); the most significant elements of discourse and organization (paragraphs in prose, lines and stanzas in verse, speeches, speakers, and stage directions in drama, notes, block quotes, sequential numerations of all kinds); and only the most essential aspects of physical formatting (page breaks, lists, tables, font changes). # Fidelity to the original. In each case, the text is intended to represent the book as originally printed, so far as that is possible. Printer's errors are preserved, hand-written changes are ignored, duplicate scans are omitted, out-of-order images are keyed in the intended order, and most of the unusual characters of the original are preserved. # Ease of reading and searching. At the same time, though the transcriptions are carried out character-by-character, TCP, on the theory that all transcription is a kind of translation from one symbolic system to another, tends to define characters in terms more of their meaning than of their form, and to map eccentric letter-forms to meaningful modern equivalents, generally in keeping with the Unicode definition of "character." # Languages. Though most of the TCP texts are in English, many are not. Books and divisions of books not in English are tagged with an appropriate language code, but are not otherwise distinguished. # Omitted material. The TCP produces Latin-alphabet ''text''. Non-textual material such as musical notation, mathematical formulae, and illustrations (except for any text they may contain) are omitted and their locations marked with a special tag. Extended text in non-Latin alphabets (Greek, Hebrew, Persian, etc.) is also omitted.


Accomplishments and prospects

As of April 2011, the TCP had created about 40,000 searchable, navigable, full-text transcriptions of early books, a database of unmatched scope, scale, and utility to students in many fields. Whether it will be able to go on to produce the remaining 38,000 texts included in its ambitious recent plans (for EEBO-TCP Phase II) will depend on the validity of its original vision, arising from the theory that libraries could and should cooperate to become producers and standard-setters rather than consumers; and that universities and commercial firms, despite their very different life-cycles, constraints, and motives, could join in durable partnerships of benefit to all parties. As of Jan 1, 2015, the full text of the EEBO phase I has been released under a Creative Commons License, and can be freely downloaded and distributed. In 2014 there were 28,466 titles available via Phase II. As of July 2015, ProQuest had the exclusive right for five years to distribute the EEBO-TCP Phase II collection. After those five years the texts will be made freely available to the public.


See also

*
Book scanning Book scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical books and magazines into digital media such as images, electronic text, or electronic books (e-books) by using an imag ...
*
Books in the United Kingdom History In 1477 William Caxton in Westminster printed '' The Dictes or Sayengis of the Philosophres,'' considered "the first dated book printed in England." The history of the book in the United Kingdom has been studied from a variety of cult ...
*
Books in the United States As of 2018, several firms in the United States rank among the world's biggest publishers of books in terms of revenue: Cengage Learning, HarperCollins, Houghton Mifflin Harcourt, McGraw-Hill Education, Scholastic, Simon & Schuster, and Wiley. H ...


References


External links


Main (Michigan) TCP web site

Oxford TCP web site

Internal TCP documentation

EEBO Phase I full-text download
* Demonstration sites (open to the public) for *
EEBO-TCP
*
ECCO-TCP
*
Evans-TCP
* Database-access sites (open to members of partner institutions) for ** EEBO-TCP at **
the University of Michigan (via DLXS)
**
the University of Chicago (via PhiloLogic)
**
Oxford University (via DLXS)
**
the ProQuest EEBO site.
** Evans-TCP a
the University of Michigan (via DLXS)
** ECCO-TCP a
the University of Michigan (via DLXS)
{{Authority control 2000 establishments in Michigan Educational organizations established in 2000 Non-profit organizations based in Michigan Library science organizations Digital library projects Bibliographic databases and indexes Early modern printing databases Textual scholarship University of Michigan Northwestern University Organisations associated with the University of Oxford Washington University in St. Louis University of Sydney University of Toronto University of Victoria