Scholarly research has come a long way since the days of poring over stacks at the library, and search engine companies are beginning to explore the particular opportunities within academic research. The Massachusetts Institute of Technology, one of the most venerated American institutes of higher learning, made its own foray into the search market with DSpace, a joint project with Hewlett-Packard that was first launched in 2002. DSpace is open-source software designed to assist colleges and universities in creating, managing, and maintaining digital repositories. There are currently about 125 schools using this software, but no tool existed that enabled searching across repositories instead of just within them. Enter Google into DSpace.
Google and 17 partner schools have joined forces on a pilot program to enable searching among DSpace repositories. In addition to the Massachusetts Institute of Technology, the other 16 universities involved are: Australian National University, Cornell University, Cranfield University, European University Institute, Hong Kong University of Science and Technology, Indiana University-Purdue University at Indianapolis, Minho University, the Ohio State University, University of Arizona, University of Calgary, University of Oregon, University of Parma, University of Rochester, University of Toronto, University of Washington, and University of Wisconsin.
The number of documents available for searching has been one point of contention for DSpace. An April 2004 article in The Chronicle of Higher Education cited DSpace as estimating that each of the 17 participants had an average of 1,000 papers in its digital archive. In Between—"a weblog on scholarly online publishing, open access, and library related technology"—published a look at available documents as of April 2004. While some universities had considerably more than 1,000 documents (MIT had 3,565, but some with limited availability, and the Australian National University had 34,050, but none as texts) most hovered around 100 and many had considerably fewer.
For the pilot program, Google and DSpace have enlisted the Online Computer Library Center (OCLC) to facilitate searching by acting as a middleman between Google and the participating schools. DSpace documents have all been tagged with metadata so that Google can sort through them more efficiently, but the Handle system that DSpace uses can be difficult for Google to manage, so OCLC plans to regularly gather DSpace metadata and convert it to formats that Google can more easily use.
Although both sides have been tight lipped about the project, representatives from DSpace have commented that the agreement with Google is not exclusive and that they are open to working with other search engine companies or even developing their own technology. Plans with Google continue to move forward, though, and if all goes well with the pilot, then Google may launch the program under its Advanced Search section within the next few months. Other schools are encouraged to participate, and DSpace hopes to eventually include all 125 colleges and universities in the program.