The Data Conversion Process
HTC Global Services, Inc. out of India handles data conversion for ECCO; thus far, they have converted 12,000 reels of microfilm for the project. HTC receives batches of prepared images from Gale's Woodbridge, Connecticut offices where film duplication, metadata preparation, and quality assurance are executed. Images are scanned at 300dpi and split into individual pages for proofing. The digital files are then run through optical character recognition (OCR) technology to check for errors and correct images. Final pages are then converted into structured XML before returning to Woodbridge for further quality control. Over 1,000,000 pages a week are checked for errors or other imperfections at the Woodbridge Gale offices. Certain points on every page are checked and if they pass inspection, the page is assumed to be correct. If there are problems with any of those eight locations, a more detailed examination follows and the pages may be returned for rescanning.
Because the English language had not been formalized in the eighteenth century, there were problems involving varied spellings and different alphabetic characters that are no longer in use. Two major challenges involved the characters Æ and ƒ, which were commonly used in the eighteenth century, but are rarely used now, and are confusing for the scanning technology. In order to avoid such misinterpretation, Gale also employs some manual checking.
Resolving the Napoleon Complex
Gale has taken particular care to make search as functional as possible for users. Texts in the eighteenth century commonly contained running heads along the page, which can seriously skew full-text searching. To alleviate the problem, Gale set the running heads as a separate field; Nunn estimates that this alteration to improve search cost Gale an additional half-million dollars or so, but they felt that it was worth the expense.
Users can search for terms in author, title, front matter, keywords, and a number of other locations to better narrow their search, but once a results page has been developed for a search, the user can not narrow the search further. This can be frustrating since a search of 33 million pages is apt to return thousands of possible results.
Other functionalities of ECCO are extensive and are designed to be user-friendly. Users can choose the scale of the image by percentage. They can rotate images—particularly useful for maps—and images can then be rotated back to the original position—a request from Gale employees who found it frustrating not to be able to do so. Each text is accompanied by a separate list of illustrations for users to browse through without skimming the entire document. Users can email a citation, print one as a PDF or to a printer, and can mark search results for personal reference.
Throughout every facet of the process, Gale's editorial philosophy has been, "if you're going to do something this vast and this expensive, you should do it right," according to Nunn. Although most definitely a research tool, Nunn is quick to say that, "this is not a productivity tool" as users are apt to end up meandering through the system, discovering threads of interest they did not originally intend. "I can see it as being inviting to beginning students and novice scholars, who will be able to trace concerns and issues across texts in many disciplines," adds Tofanelli. "It invites researchers at every level to see texts of the eighteenth century in relationship to one another and to explore those relationships."
Since the material is primarily British (approximately 80 percent of the material is from the British Library), ECCO offers a historical perspective that Americans may not be used to—particularly regarding the American Revolution. "Think about it," says Gale president Allen Paschal, "If you're researching the American Revolution and bringing in the British perspective, you offer a more well-rounded research environment and that is what people want."
Gale describes ECCO as, "The most ambitious single digitization project ever undertaken deliver[ing] every significant English-language and foreign-language title printed in Great Britain, along with thousands of important works from the Americas." With 33 million pages and counting, ECCO seems to be delivering on its goal; keep an eye out for ECCO in a library near you.