Internet Archives

Internet Archive: an attempt to take "snapshots" of the billions of web pages that come into existence (and disappear) every year.

LOCKSS (Lots of Copies Keeps Stuff Safe): a peer-to-peer technology application designed to insure the long-term survival of digital media of all types.

British Library--Online Gallery: digital surrogates sampling a range of the library's collected texts and objects.  The most ambitious and completely imaged are the "Virtual Books."  Most are medieval, but the collection includes some modern manuscript works by William Blake, Wolfgang Mozart, Lewis Carroll, and Jane Austen. Readers must download a Microsoft application before the "Turning the Pages" software can operate.

Library of Congress--American Memory: a range of American artifacts partially or fully digitized.  The "literature" section is tiny compared with the British Library's "Virtual Books" collection.  Two extensive MS collections cover Zora Neale Hurston and Walt Whitman, but the others are relatively unknown C19 authors.  The "manuscript" collection also includes some work by literary figures.

World Digital Library: an (insanely?) ambitious UNESCO project which holds digital surrogates from 8000 BCE to the current year, and from nine geographical regions covering most of the Earth.  Maps and visual art appear to constitute the bulk of the collection at the moment (8/2009).

The Oxyrhynchus Papyri: half a million pieces of papyrus manuscript found in the early Twentieth Century in an ancient midden heap at Oxyrhynchus, south of the Egyptian city of Memphis.  Most texts are Greek, the language spoken by the Ptolemaic rulers of Egypt after the fragmentation of Alexander's Macedonian empire, and they have added important works to the study of the early history of Christianity and of the works of Greek secular authors.  They were removed to the University of Oxford where they are still being studied.

Internet Archives Holding Two Unique MSS: The "Archimedes Palimpsest" and the "Codex Sinaiticus" are two complex parchment manuscripts from the tenth and fourth centuries CE that have their own web archives for quite different reasons.  The "Archimedes" MS was an otherwise uninteresting thirteenth-century prayer book written on top of the partially erased tenth-century manuscript that contained seven treatises by Archimedes, two of which survive only in this manuscript.  When one manuscript shows through another written above it, the result is called a "palimpsest."  The older MS is the only surviving Greek copy of Archimedes' work (287-212 BCE), and it also preserves unique copies of speeches by Hyperides, an orator in Classical Athens (ca. 390-322 BCE).  The manuscript was decoded by a heroic collaborative effort involving Greek scholars, mathematicians, manuscript conservators, and experts in X-ray Fluorescence imaging, and optical character recognition.  The manuscript itself, bought by an anonymous collector for two million dollars in 1998, was disassembled, cleaned, and studied at The Walters Art Gallery in Baltimore by curator Will Noel and conservator Abigail Quandt.  The "Codex Sinaiticus" (Latin, "Sinai book") once contained the oldest complete text of the work Christians call "the Bible," but its leaves were broken up and sold to collectors in many parts of the world.  The text was written in the mid-fourth century and, like most Christian texts (and no pagan or Jewish texts) it was made in the form of a book rather than a scroll.  The "Sinaiticus" contains a heavily corrected version of the Septuagint and the New Testament, the now-canonical biblical books combined in roughly the same order as they now appear in Bibles authorized by Christian churches.  In surviving manuscripts produced before this time, those texts only occur in separate manuscripts, grouped in various ways or alone.  In the mid-nineteenth century, portions of the manuscript were taken to Europe for publication from the Monastery of Saint Catherine, on Mount Sinai, in whose library the manuscript had been protected for centuries.  The largest fragment (247 ll.) was held in the Soviet Union until its purchase by the British Library in 1933.  Other leaves remain at Leipzig's University Library, the National Library of Russia (St. Petersburg), and Saint Catherine's Monastery where leaves and fragments were discovered hidden in a wall in 1975.  Beginning in 2005, all the scattered leaves of the codex were imaged and stored on one web site, making it possible to study the entire manuscript once more for the first time in hundreds, or perhaps a thousand years.  Both of these projects could only have been accomplished using the Internet and modern imaging technology.

Internet Backbone Map: because text stored on the Internet is not affixed to a paper or parchment substrate, but rather manifested in pixels on screens when data is piped to the CPU which runs these displays, the pipelines through which the data travels must also be understood as part of the "archeology of Internet text."  This map, created in 2006, is now obsolete for operational purposes, but it will do to represent this component of "Internet text" for our discussions.

Bob Lash, "Memoir of a Homebrew Computer Club Member": before the personal computer was made commercially practical by Jobs and Wozniak (Apple's founders), there was a period of excited amateur computer construction and programming that swept 1970s-80s America.  Bob Lash was lucky enough to live in Palo Alto, California, near Stanford University, where many important experiments in computer design were going on, and the site of Palo Alto Research Center (AKA "Xerox PARC), site of the nation's first experiments in "graphical user interfaces" that represented programs and documents as icons on a display screen, activated them with a moveable cursor controlled with a hand-driven "mouse," and countless other inventions that Xerox (weirdly) never took commercial advantage of, hence Apple's success.  Homebrew was a crucial "incubator" of hardware, software, and networking talent that made major contributions to the Internet's invention and development.

"How the Moby Shakespeare Took Over the Internet": (Eric M. Johnson, George Mason University M.A. Thesis): while pursuing his M.A. project, the "Open-Source Shakespeare," he uncovered the fascinating story of how a digitized copy of the 1864 Globe edition Shakespeare's works came to spread, weed-like, to all corners of the Internet.  The "Moby Shakespeare" is so old, in Internet-years, that its origins cannot be authoritatively determined other than that it must derive from some version of the Globe edition.  The essay linked above can serve as an illustration of how the web economy reproduces "Gresham's Law," which predicts that the weaker/cheaper currency drives out the stronger.  The prevalence and widespread acceptance (by amateurs) of the Moby Shakespeare text, owned and protected by no scholarly authority, slowed or stopped production of up-to-date online Shakespeare editions produced by scholars.

The National Archive (a hybrid site: mostly print / part digital).

Modern Literature Online Surrogate Sites (Note: these sites are based on digitized early printed books and manuscripts, but they have achieved enough funding and infrastructural durability to be considered archives in their own right.)

Project Gutenberg: http://www.gutenberg.org/wiki/Main_Page  (The oldest such site on the Internet, Project Gutenberg was founded by Michael Hart in 1971 to make freely available out-of-copyright works of literature, political documents, etc.)

The Rossetti Archive: http://www.rossettiarchive.org/  (Dante Gabrielle Rossetti, at the University of Virginia)

The William Blake Archive: http://www.blakearchive.org/blake/

German Emblem Books (Early Modern Print): http://images.library.uiuc.edu/projects/emblems/

Renaissance Literature Online Surrogate Sites

Project Perseus Renaissance Online (Marlowe, Shakespeare, etc.): http://www.perseus.tufts.edu/hopper/collection?collection=Perseus:collection:Renaissance 

Note that Project Perseus began as a Greek and Latin classical literature text-base, first sold as a CD-ROM in the 1980s, and later migrated to the WorldWideWeb on a server at Tufts University, which is its current home in 2015.  A great many other texts can be found there, but like all digital archives, they are available only as long as the chain connecting their digital files on the Tufts server and the Internet link to your browsing device.

2007 Press Stories about Digital Texts as Literature and Digital Search Engines are Archivists