Project Gutenberg (PG) is a
volunteer effort to
digitize and archive
cultural works, as well as to "encourage the creation and distribution of
eBooks
An ebook (short for electronic book), also known as an e-book or eBook, is a book publication made available in digital form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. Alt ...
."
[
] It was founded in 1971 by American writer
Michael S. Hart
Michael Stern Hart (March 8, 1947 – September 6, 2011) was an American author, best known as the inventor of the e-book and the founder of Project Gutenberg (PG), the first project to make e-books freely available via the Internet. H ...
and is the oldest
digital library.
[ Most of the items in its collection are the full texts of ]book
A book is a medium for recording information in the form of writing or images, typically composed of many pages (made of papyrus, parchment, vellum, or paper) bound together and protected by a cover. The technical term for this phys ...
s or individual stories in the public domain. All files can be accessed for free under an open format layout, available on almost any computer. , Project Gutenberg had reached 50,000 items in its collection of free eBooks.
The releases are available in plain text as well as other formats, such as HTML, PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
, EPUB
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartphones ...
, MOBI, and Plucker wherever possible. Most releases are in the English language
English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to t ...
, but many non-English works are also available. There are multiple affiliated projects that provide additional content, including region- and language-specific works. Project Gutenberg is closely affiliated with Distributed Proofreaders, an Internet-based community for proofreading scanned texts.
Project Gutenberg is named after Johannes Gutenberg
Johannes Gensfleisch zur Laden zum Gutenberg (; – 3 February 1468) was a German inventor and Artisan, craftsman who introduced letterpress printing to Europe with his movable type, movable-type printing press. Though not the first of its ki ...
, who introduced book printing with movable type in Europe.
History
Michael S. Hart
Michael Stern Hart (March 8, 1947 – September 6, 2011) was an American author, best known as the inventor of the e-book and the founder of Project Gutenberg (PG), the first project to make e-books freely available via the Internet. H ...
began Project Gutenberg in 1971 with the digitization of the United States Declaration of Independence. Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer
A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...
in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of computer time; its value at that time has since been variously estimated at $100,000 or $100,000,000. Hart explained he wanted to "give back" this gift by doing something one could consider to be of great value. His initial goal was to make the 10,000 most consulted books available to the public at little or no charge by the end of the 20th century.
On July 4, 1971, after being inspired by a free printed copy of the U.S. Declaration of Independence, he decided to type the text into a computer, and to transmit it to other users on the computer network.
:— Gregory B. Newby
This particular computer was one of the 15 nodes on ARPANET, the computer network that would become the Internet
The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a ''internetworking, network of networks'' that consists ...
. Hart believed one day the general public would be able to access computers and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg e-text. He named the project for Johannes Gutenberg
Johannes Gensfleisch zur Laden zum Gutenberg (; – 3 February 1468) was a German inventor and Artisan, craftsman who introduced letterpress printing to Europe with his movable type, movable-type printing press. Though not the first of its ki ...
, the fifteenth century German printer who propelled the movable type
Movable type (US English; moveable type in British English) is the system and technology of printing and typography that uses movable components to reproduce the elements of a document (usually individual alphanumeric characters or punctuatio ...
printing press revolution.
By the mid-1990s, Hart was running Project Gutenberg from Illinois Benedictine College
Benedictine University is a private Roman Catholic university in Lisle, Illinois. It was founded in 1887 as St. Procopius College by the Benedictine monks of St. Procopius Abbey in the Pilsen community on the West Side of Chicago. The instit ...
. More volunteers had joined the effort. He manually entered all of the text until 1989 when image scanners and optical character recognition software improved and became more available, making book scanning more feasible. Hart later came to an arrangement with Carnegie Mellon University, which agreed to administer Project Gutenberg's finances. As the volume of e-texts increased, volunteers began to take over the project's day-to-day operations that Hart had run.
Italian volunteer Pietro Di Miceli developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, contributing to the project's popularity.
Starting in 2004, an improved online catalog made Project Gutenberg content easier to browse, access and hyperlink
In computing, a hyperlink, or simply a link, is a digital reference to data that the user can follow or be guided by clicking or tapping. A hyperlink points to a whole document or to a specific element within a document. Hypertext is text wit ...
. Project Gutenberg is now hosted by ibiblio at the University of North Carolina at Chapel Hill.
Hart died on 6 September 2011 at his home in Urbana, Illinois, at the age of 64.
CD and DVD project
In August 2003, Project Gutenberg created a CD containing approximately 600 of the "best" e-books from the collection. The CD is available for download as an ISO image. When users are unable to download the CD, they can request to have a copy sent to them, free of charge.
In December 2003, a DVD was created containing nearly 10,000 items. At the time, this represented almost the entire collection. In early 2004, the DVD also became available by mail.
In July 2007, a new edition of the DVD was released containing over 17,000 books, and in April 2010, a dual-layer DVD was released, containing nearly 30,000 items.
The majority of the DVDs, and all of the CDs mailed by the project, were recorded on recordable media by volunteers. However, the new dual layer DVDs were manufactured, as it proved more economical than having volunteers burn them. , the project has mailed approximately 40,000 discs. As of 2017, the delivery of free CDs has been discontinued, though the ISO image is still available for download.
Scope of collection
, Project Gutenberg claimed over items in its collection, with an average of over 50 new e-books being added each week. These are primarily works of literature
Literature is any collection of written work, but it is also used more narrowly for writings specifically considered to be an art form, especially prose fiction, drama, and poetry. In recent centuries, the definition has expanded to inclu ...
from the Western cultural tradition. In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has cookbooks, reference works and issues of periodicals. The Project Gutenberg collection also has a few non-text items such as audio files and music-notation files.
Most releases are in English, but there are also significant numbers in many other languages. , the non-English languages most represented are: French, German, Finnish, Dutch, Italian, and Portuguese.
Whenever possible, Gutenberg releases are available in plain text, mainly using US-ASCII character encoding
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
but frequently extended to ISO-8859-1
ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1 ...
(needed to represent accented characters in French and Scharfes s in German, for example). Besides being copyright-free, the requirement for a Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power ...
( character set) text version of the release has been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believes this is the format most likely to be readable in the extended future. Out of necessity, this criterion has had to be extended further for the sizable collection of texts in East Asian languages such as Chinese and Japanese now in the collection, where UTF-8 is used instead.
Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is HTML, which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be easier to read. But some formats that are not easily editable, such as PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
, are generally not considered to fit with the goals of Project Gutenberg. Also Project Gutenberg has two options for master formats that can be submitted (from which all other files are generated): customized versions of the Text Encoding Initiative standard (since 2005) and reStructuredText (since 2011).
Beginning in 2009, the Project Gutenberg catalog began offering auto-generated alternate file formats, including HTML (when not already provided), EPUB
EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartphones ...
and plucker.
Ideals
Michael Hart said in 2004, "The mission of Project Gutenberg is simple: 'To encourage the creation and distribution of ebooks'". His goal was "to provide as many e-books in as many formats as possible for the entire world to read in as many languages as possible". Likewise, a project slogan is to "break down the bars of ignorance and illiteracy", because its volunteers aim to continue spreading public literacy and appreciation for the literary heritage just as