Scanning the Stacks: The Digital Rights Issues Behind Book Digitization Projects

Page 1 of 3

Ask fans of the dearly departed free file-swapping software Grokster—if digital content sounds too good to be true, or too cheap to be legal, it probably is. While building a free digital library might not seem like an audacious move at first glance, when three major Internet companies each aspire to create the biggest, most widely accessible library ever—copyright watchers the world over take notice.

Google's recent efforts to bring the entire collections of major research libraries like those at the University of Michigan and Harvard online through its free search engine have earned admiration from some parties, and intense legal scrutiny from others. Two lawsuits challenge the legality of Google's latest business endeavor, allowing enough of a delay in Google's scanning process to provide engine rivals Yahoo! and Microsoft an opportunity roll out their own book digitization projects. Yahoo! and Microsoft have taken a different approach, sidestepping Google's copyright quagmire by playing by publishers' laundry list of rules and limiting the scope of their libraries to unprotected books in the public domain. As Google, Yahoo!, and Microsoft gear up for the grueling scanning process ahead, the debate continues over what constitutes "fair use," what copyright is meant to protect, and what we'll be able to find on the shelves of the digital library in the future.

Google Print, one of many new services Google has introduced as extensions of its popular search engine, was announced in December 2004. The project is divided into two parts: Google Print Publisher and Google Print Library. Print Publisher has been met with enthusiasm from a number of publishers, whereas Google Print Library has been put under the microscope for playing fast and loose with its interpretation of copyright law.

In Google Print Publisher, publishers can sign up for free to submit their books for inclusion in Google's search index and receive half the revenue from contextual ads that Google pairs with search results. As books in Google Print Publisher are searched, a bibliographic record appears, and users can view the page on which the search term is located, plus up to two pages on either side of the keyword. Also displayed with search results are links to Web sites selling the book, including the publisher itself along with book e vendors like (who is developing its own search-between-the-pages function as well as a pay-to-read digital book program). Although Google scans and stores the full text of each book into its servers, a few pages are purposely excluded, and users cannot print or copy images. 

Google Print Publisher has received largely positive reactions from publishers, authors, and users alike. Penn State Press, a nonprofit, scholarly publisher, agreed to put a significant portion of its catalog into Print Publisher during test stages of the program, and Tony Sanfilippo, marketing and sales director at Penn State Press, said that he would recommend it to his nonprofit and commercial peers. Sanfilippo admits that Penn State Press "hasn't seen much money" from the advertising revenue that Google promises. However, Penn reports that sales for some slow-selling hardcover volumes have increased dramatically since being added to the Print Publisher database, in some cases going from selling 1-2 copies a year to 15-20 copies of the same book every quarter. 

In Google Print Publisher, publishers have a proactive say as to which of its books are scanned, but with Google Print Library, Google delved into the stacks of major libraries at the University of Michigan, Harvard University, Stanford, Oxford, and the New York Public Library and began scanning the collections without regard to copyright status. Google provides the labor and financial backing (typically about 10¢ per page) in exchange for access to the books, and it creates two digital copies, one going into Google Print Library and the other going to the participating university. Google will spend an estimated $200 million to scan and index 15 million books by 2015.

While many libraries and publishers have begun their own digitization efforts, scanning millions of books isn't cheap. Professor James Hilton, associate provost and interim university librarian for the University of Michigan, says that his library has been collaborating with Google on the Print Library Project since 2002. According to Hilton, without Google's help, "it would have taken over 1,000 years to digitize our current collections," but with Google's technology, all seven million volumes will be online in six years. He reaffirms Google's belief that their partnership falls within the spectrum of fair use: "The goal of a research library like ours is to secure, preserve, and archive knowledge—all of it, because . . . if we don't preserve and protect it, no one else will." 

Sanfilippo from Penn, on the other hand, sees "no distinction" between digital copies and print copies. If Google makes a copy and gives it away to the University of Michigan, he argues, the publisher doesn't get the chance to sell digital copies of its own material to libraries. 

Google Print Library allows publishers to opt out of participation and honors all requests to exclude certain protected works. Google points to the large number of works in the public domain, orphaned works, and out-of-print titles, saying that its primary goal is to allow users to discover books, not necessarily to read them online. Hilton agrees, citing the large number of works "where the owner of a copyright work is simply not knowable or can no longer be asked for permission. I've seen estimates that this class of works reaches into the millions." 

After Google announced its intention to scan library books, both the Authors Guild and the American Association of Publishers (AAP) filed separate lawsuits alleging that Google was violating the Copyright Act by reproducing copyrighted material for commercial gain. While attempting to negotiate a compromise with members of the AAP over the summer, Google voluntarily agreed to stop scanning copyrighted material. 

The AAP extended a compromise based on the unique ISBN number that has been assigned to every book published since 1967. Using ISBN numbers, the AAP argued, Google could determine which works are under copyright and contact the listed publisher and author to obtain permission before scanning. When Google rejected the ISBN proposal, talks broke down. Google resumed scanning the books in question as scheduled on November 1. The courts have asked Google to prove its intentions constitute fair use by November 30.    

Page 1 of 3