Libraries lead the way in pioneering many digital initiatives, but what the local library implements to manage its ejournal collection or even a more ambitious digital archiving project will rarely scale to enterprise proportions. However, the University of Virginia realized that even libraries need to contend with issues of scale up front, simply taking for granted the exponential growth inevitable for digital collections. But for them, it wasn't just a question of a whole lot of digital assets, but also of the diversity of content types, and the creation of a large quantity of meaningful metadata.
When the library set out to purchase a digital library solution four years ago, it investigated all of the usual suspects along with content management and asset management vendors, but each solution left them wanting. As Thornton Staples, director of digital library research and development at Virginia describes it, "There are not good digital library systems out there. Everyone worries so much about reinventing the wheel, but if you don't have a good wheel to begin with you aren't reinventing anything."
So when Staples and Ross Wayland, associate director of the digital library's R&D, read a paper by Sandra Payette and Carl Lagoze at Cornell, they were looking for something brand-new in digital library thinking. They liked what they read. According to Staples, "We got their software and played with it. It was elegant, but it had no database and it was not optimized." So, after some retooling of the prototype, the University of Virginia earned a grant from the Andrew W. Mellon Foundation in September 2001 and formed a partnership with Cornell to develop the first digital object repository management system based on the Flexible Extensible Digital Object and Repository Architecture (Fedora).
Phase I of Fedora, the release of the first version of the system, was completed in May, though they did not begin to publicize it until version 1.1 in August. The Fedora repository system is an open source, digital object repository system using public APIs exposed as Web services. "You can build a basic Fedora repository and manage and deliver stuff with version 1. It is not a digital library in a box, nor was it intended to be," says Staples. "If you are a library with no technical support, well, don't try this at home."
"Libraries tend to function without a lot of technical support," according to Staples, "so they think about archiving—preservation and access. But there's so much more a digital library can accomplish. You need to manage objects as if they are all the same, but you need to exploit objects as if they are unique. It's that tension that makes it really difficult." Undaunted, though, Staples says, "I just feel like no matter how hard and complicated the process is, we have to do it."
Fedora's strength lies not only in its ability to handle an enormous quantity of objects—Staples says they've scaled up to 10 million with little difference in performance from their initial half-million objects—but also in its ability to contain and disseminate information about those objects. The Fedora architecture is based on object models that by definition are templates for units of content, called data objects, which can include digital resources, metadata about the resources, and links to software tools and services that have been configured to deliver the content. As Staples puts it, "Most people think a digital library is about digital asset management. But the ultimate power is in interrelation of all of these things." With Fedora, he says, "You're building a very elaborate network of content; you are building nodes in a network that have relationships to each other."
Version 1.1 provides XML submission and storage; parameterized disseminators, basic access control and authentication; OAI metadata harvesting, a default disseminator, searching, and a batch utility. Version 2.0, which is expected to be released a year from January, will offer expanded policy expression and enforcement.
Fedora strives to have a "low barrier to entry" and, as such, can be readily adopted "as is" without much of the customization it offers. It can also be employed as a DAM system and allows for assets to be accessed by one or more client applications. And, of course, Fedora can be used for a complex digital library scheme, adding functionality like software tools for delivery and analysis along with higher levels of aggregation and information about the data objects. It can also be used to build a widely-distributed repository.
Ultimately, though, Staples encourages potential users of Fedora to view it as the architecture, or even the wiring or plumbing of a well-built content repository. It provides the needed information infrastructure so that information "flows" according to Staples.
While Staples believes that Fedora will offer a robust solution for both institutions and other organizations, he is quick to make clear, "We are not in the software business; we would really like to buy a system from someone." The university tried to interest a major digital library software maker in working with them on the project, but they declined feeling it was "too risky." Staples still has hope that a software vendor will see the potential of Fedora and make an investment in fully realizing that potential.
Later phases of the three-year project will focus on adding functionality relating to security and policy enforcement, reporting monitoring, search, and interoperability among repositories. Future work by the development team will be aimed at producing software to support very large-scale, very efficient repositories of digital information.