The LoC: A Leader and a Work in Progress


      Bookmark and Share

BEST PRACTICES SERIES

In 1990, before the World Wide Web played a role in the lives of the general public, the Library of Congress (LoC) set out to build a digital library. The LoC has long served as the nation's library; where the creative and historical endeavors of Americans are ensured a lasting home. The goal of the Library of Congress is, "to record our cultural heritage," according to Laura Campbell, associate librarian for the National Digital Library at the LoC. "We don't subtract, we just add, and now we have to add all of this digital material." While an admirable undertaking, in practice the questions of what to preserve and how to preserve it have moved to the forefront.

The original digital endeavor by the LoC resulted in the American Memory Collection, a multimedia archive containing approximately eight million digitized items such as the first movies ever produced. The freely accessible Collection has primarily been used for educational purposes. Initially, items in the Collection were often selected because they were in the public domain, which circumvented copyright issues. Other acquisition methods include purchase, copyright deposit, exchanges with other countries, licensing, as well as gifts or donations, which have become an enormous source for the Collection. "We didn't know we were leading," says Campbell about beginning to amass the digital collection, "but others were following and that was mighty scary because we didn't know what we were doing." They made decisions as they went along and serendipitously most of it worked.

In December of 2000, $175 million of federal and private funding was earmarked to facilitate the creation of a collaborative project, called the National Digital Information Infrastructure and Preservation Program (NDIIPP), to be led by the LoC. Though, according to Campbell, $175 million "is not a huge sum of money given the challenges involved." Congress also stipulated that the LoC work with other federal agencies such as the White House and the U.S. National Archives and Records Administration as well as libraries and private establishments that are involved in the creation and distribution of digital content.

Priorities for the NDIIPP are intellectual property and liability issues, the scope of collecting, and maintaining the balance between preservation and access. They identified three scenarios for preservation: Triage, where only the most vital of information and content is saved (though Campbell notes the inherent subjectivity of a definition like "most vital"); a Congress of Libraries; and Universal Libraries, where everyone saves everything. Campbell believes that Universal Libraries are still a ways off, as they would require a solid economy and rapid technological development. They have found—with some surprise—that the technology has not been the most difficult part. "The human element has been the hardest part of the problem," she explains. "Who is going to do what for whom?"

The architecture for the program initially consisted of four layers: a Repository layer for storage, a Gateway layer for managing access to Repositories, the Collection layer itself, and the Interface layer where patrons interact with the material. The revised version 0.2 architecture layers have been renamed Upper (the Interface), Middle (where the content resides), and Lower (the storage level). What was considered the Gateway layer has been spread out over the other three in the modified architecture. Version 0.2 has been released for review by interested parties and the LoC is encouraging feedback on the architecture.

The NDIIPP made great strides in 2003. In addition to refining its technical architecture, it identified existing architectures to test and developed a test bed, which has recently been launched with six different architectures; developed technical specifications for an open source crawler; and, in November, received 22 proposals for content partners from sources such as news broadcasts and social science databases. Content partners for the program must be organizations that have a level of technological competency already, as well as a preservation strategy. They must be domestic, though they can be commercial. The NDIIPP is currently working on fail-safe agreements so that the LoC retains content if a partner goes under.

For now, the NDIIPP archives content in the format in which it arrives. Particularly because contributions are voluntary, they can not ensure that specifications will be adhered to, but they plan to establish specs in hopes for future uniformity. They are also beginning to migrate earlier work and content, though this is primarily to facilitate more effective searching rather than over a worry of decay.

In fact, the issue of migration has become a sticking point for some. In studies conducted by the LoC, questions were raised as to whether migrating content (for example, from one MPEG standard to the next) actually destroys the original content instead of improving it. The study questions whether the "improvement" in sound quality that would arise from migrating a song that was recorded on less sophisticated equipment detracts from the integrity of the original by misrepresenting the recording technology of the time.

The NDIIPP planned to announce its first group of content partners in April. And, looking to the immediate future, it plans to make business model announcements, call for research with the National Science Foundation, work on prototype development, and test six to eight of its existing architectures.

The mission of the NDIIPP is to, "Develop a national strategy to collect, archive, and preserve the burgeoning amounts of digital content, especially materials that are created only in digital formats, for current and future generations." According to Campbell, "Our strategy is simple: It is incremental—to learn by doing."
(www.digitalpreservation.gov)