Web Site Sued For Controversial Trip into Internet Past


      Bookmark and Share

Article ImageCyberspace doesn't give its travelers much room for reflection. Every day, millions of Web sites are updated, and older versions are erased from existence with the click of a button. Remember when Amazon.com sold only books? Or when WebCrawler ruled the search universe? The Wayback Machine does.

Wayback grabs these pieces of history before they disappear and puts them into a virtual Internet museum for historians, academics, and nostalgics to browse. Founded by Brewster Kahle, an early Web archiving pioneer, the Wayback Machine is a part of the Internet Archive, a nonprofit organization devoted to preserving data, texts, audio, Web sites, and other digital materials since the early days of the online revolution. Since 1996, the Wayback Machine has been sending out automated crawlers to all corners of the Internet and collecting digital, archived copies of everything they encounter, making pages searchable in three dimensions by URL and date. Archived collections allow users to see how the Web looked during important historical moments such as the 2000 presidential election or September 11, 2001, as well as critical technological junctures, such as the first Web page devoted to a streaming Web cam (the Trojan Room Coffee Machine) or an ancient version of eBay. Now, however, this nonprofit digital museum is embroiled in a lawsuit that has the potential to decide how we'll be able to look back at the Internet from the future.

With more than 1 petabyte of data already archived and nearly 20 terabytes being added each month, the Wayback Machine's vast archives aren't just for Web historians. Intellectual property lawyers have routinely accessed archived pages to prove or dispute legal claims. However, this legal use may prove to be Wayback's undoing. One company, Healthcare Advocates, Inc., cried foul after lawyers from the Philadelphia firm Harding, Earley, Follmer & Frailey used the Wayback Machine to defend a client, Health Advocates, against Healthcare Advocates' claims of trademark violation. After Healthcare Advocates lost the initial lawsuit, it filed another—this time including the Wayback Machine as a defendant, claiming that by allowing access to copyrighted, archived material, the Wayback Machine had violated provisions of the Digital Millennium Copyright Act (DMCA) and the Computer Fraud and Abuse Act.

The laws governing what can and cannot be cached, linked, or archived are pretty fuzzy. The Wayback Machine's policy, according to its Web site, is that it will exclude any site using the "robots.txt" directive to block the Machine's crawlers. Web administrators use this tag in order to keep robots of all kinds from picking up or linking to certain pages, but, according to information industry researcher and consultant Mary Ellen Bates, the robots.txt file is "completely a voluntary compliance. It's like a ‘Do Not Disturb' sign on a hotel door." Search engines typically respect this command, and the Wayback Machine is no exception. However, according to Bates, it's "an opt-out rather than an opt-in" for Web site owners, meaning that the Wayback Machine will crawl and store copyrighted pages unless it is asked to stop. When it does encounter a robots.txt file, the crawler will not only leave the page alone, it will go back into all archived versions of that page and remove them from public access.

According to the lawsuit against Wayback, Healthcare Advocates had protected the pages in question with robots.txt, only to have several hundred rapid-fire access requests made within minutes for the blocked pages through the Wayback Machine from the Harding, Earley, Follmer & Frailey firm. In most of the instances, the tag blocked the requests, but due to a bug in the system, a few got through, allowing the firm to access protected content via the Wayback Machine. The DMCA prohibits circumvention of protective technological measures. While robots.txt isn't legally binding, the legal question here appears to be one of intention: Did the Wayback Machine intentionally ignore the robots.txt file and allow unauthorized access to copyrighted material? Or did the lawyers at Harding, Earley, Follmer & Frailey know the system well enough to circumvent it?

Bruce Sunstein, a partner at the Boston-based law firm Bromburg and Sunstein, LLP, says that the legal grounding of the suit against the Wayback Machine seems "pretty weak . . . it began with a trademark beef and ended up with this sideshow." While in the short term the lawsuit seems to be more confusing than concerning to legal experts like Sunstein, the long-term implications might bring robots and their functions under the DMCA's scrutiny. While the copyright statutes provide for "fair use" exceptions, it's left for the courts to decide what constitutes "fair." Any decision against the Wayback Machine would draw the legal line in the sand for how and when copyrighted Web sites can be crawled and stored, and search engines and sites that employ crawlers will be keeping a close eye on the outcome of the Wayback Machine's lawsuit.

According to Sunstein, "Copyright over time has had to be changed as different media for experiencing creative content has changed." The Wayback Machine has found itself at the center of the latest digital rights debate, but until the courts say otherwise, it will keep on crawling through cyberspace, letting users look through its portal into the Internet's past.

(www.archive.org/web/web.php)