Scanning the Stacks: The Digital Rights Issues Behind Book Digitization Projects

Page 2 of 3

      Bookmark and Share

BEST PRACTICES SERIES

Publishers Find a Friend in Yahoo!
It was only a few months after Google unveiled Google Print Library that Yahoo! grabbed the spotlight with its own plan for searching books (although Yahoo! claims that it had one in the works a few months before Google Print Library was announced). Yahoo! and the Open Content Alliance, a 34 (and growing)-member organization committed to building an online content resource based on an open-source software development model, announced that they were beginning to scan works in the public domain and unveiled the first thousand books on the Internet Archive's Web site.

The OCA is a nonprofit group that depends on the cooperation of the Internet, publishing, and library communities, and new members are signing on as word spreads. Charter members of the OCA include the University of California, the University of Toronto, Adobe, the European Archive, the National Archives of England, O'Reilly Media, and Hewlett Packard Labs. 

Yahoo! will provide funding to scan 18,000 volumes in the University of California Americana collection (19th century American books out of copyright). The Internet Archive, a nonprofit Web page index headed by Brewster Kahle, will take over scanning duties for the OCA, and libraries that take part can have their books scanned by the OCA for 10¢ a page, a bargain for most libraries looking to get their digital collections off the ground. The OCA hopes to have a complete collection of out-of-print works and those in the public domain, but plans to expand to include copyrighted content from publishers who actively opt into the program. Unlike Google, however, the OCA book project will allow users to read extended portions of copyrighted texts, even, in some cases, full copies. In another departure, the OCA's collection can be searched by any search engine, including Google's. 

The OCA has been endorsed by a number of academic and commercial publishers, including those currently suing Google. Association of American Publishers president and CEO Pat Schroeder has praised Yahoo!'s book digitization plan for doing it "the right way." Linda Golodner, president of the National Consumers League and author of a letter to Congress calling for an investigation of Google Print, also endorsed the OCA project: "Programs that give creators the capacity to choose how their material will be used [are] ultimately the best to accomplish the free flow of information." 

Microsoft Joins the Fray
Microsoft, in hatching its own digital book scheme last fall, structured its plans much more akin to Yahoo!'s publisher-friendly model than Google's. Like Yahoo!, Microsoft will join the OCA and use the books scanned by the Internet Archive. However, it will make the collection available through its own book search engine, MSN Book Search, in a move that reflects Microsoft's interest in entering the online portal market. So far, Microsoft has pledged an estimated $5 million to pay for the digitization of 150,000 books. In November, it began scanning about 100,000 uncopyrighted works from Oxford University's library. Both Microsoft and Yahoo! are currently exploring pay-by-the-page and other business models to give publishers a financial incentive to turn their copyrighted work over for free.

Legal Challenges to Book Digitization
Maybe it's the talk of Google being poised to unseat Microsoft as America's largest Internet company, its $400 per-share stock price, or maybe it's the simple fact that Google gives it all away for free, but one thing's for sure—Google has rattled quite a few cages with its ambitious plan. "Americans like winners, but not really big winners," says Marc Strohlein, vice-president and lead analyst at Outsell, Inc., an information industry research and advisory firm. "The lawsuits against Google will inevitably draw attention to other practices, most notably caching of Web pages and, in fact, that is already happening with the Internet Archive," which is currently facing a copyright infringement lawsuit of its own.

While a number of organizations have come out against the Google Print Library Project, Google maintains that its use of the copyrighted material is wholly legal, and that its intention is good enough to outweigh potential harms. Hilton from the University of Michigan agrees, saying, "The fact that Google is a corporate entity does not change the nature of the project nor the public benefit it achieves." 

How Much is a Snippet?
Based on factors like what the copyrighted work is, how much is used, and for what purpose it's being used, courts determine whether or not each case of copyright violation constitutes fair use. Google argues that the snippet (anywhere from a sentence to a paragraph) of text it displays for each search result presents no threat to book sales; in fact, Google claims that it will actually help publishers and authors collect more royalties by introducing the book to potential customers and pointing them to places at which they can purchase it. Libraries and universities are also allowed to make and distribute digital copies under certain circumstance on their own under law, as long as it's not intended for commercial use.

Google Print Library might position itself legally as an archive, but the Authors Guild contests this in its class-action lawsuit against Google. According to the Guild's complaint, "Google plans to use the Works from the library of the University of Michigan in order to attract visitors, and thereby advertisers, to its Web site." The Authors Guild is seeking statutory damages on behalf of its entire membership for alleged copyright infringement as well as a court order to prevent Google from scanning copyrighted material in the future. 

The American Association of Publishers (AAP) is accusing Google of "massive, wholesale, and systematic copying of entire books still protected by copyright for public distribution and public display," according to the lawsuit brought on behalf of five of its nearly 300 members: McGraw-Hill, Pearson Education, Penguin, Simon & Schuster, and John Wiley & Sons, Inc. 

AAP head Schroeder, a former congresswoman from Colorado, doesn't buy Google's argument that it's only displaying a snippet of text. "Snippet is not a legal term, it's a Google term," she says, pointing out that the so-called snippet actually reveals up to several pages from the text. "What it basically is, is feudalism," Schroeder explains, in which wealthy search engines take the work of content creators to make a profit in a move that she calls an act of "rouge eminent domain."

On October 20, the National Consumers League (NCL) joined the fight to prevent Google Print Library from scanning copyrighted material. NCL president Linda Golodner sent letters to Senators Lamar Smith and Orrin Hatch of the Congressional Subcommittee on Courts, the Internet, and Intellectual Property, urging the Senators to hold "timely public hearings on this issue of great public, legal, and cultural significance." The letter from the NCL contends that, while Google Print Library is "set[ting] the stage for a quantum leap in consumer access to information," it has not thoroughly addressed copyright issues, its obligation to content creators, and the cultural impact its collection will have on the literary landscape. 

According to Golodner, this is "important to consumers because of the need to balance the interests of broad access to print and copyright protection to preserve and reward creative talent . . . [The] NCL thinks that all sides should fully air and examine this matter before it is too late—let them all have their say."        

Page 2 of 3