Archiving the Attic


      Bookmark and Share

I was chatting online with a friend the other night, and she mentioned the lyrics of a Reba McEntire song from 1979. Within 15 seconds, I had the lyrics in front of me; my friend assumed that I had the song memorized. To be honest, even I was surprised that I found it so quickly, though I do assume that relatively obscure content from almost a quarter century ago is readily available. What fascinates me is the idea that, while the concept of cataloging all the world's knowledge is impossible, so many people are willing to spend time cataloging their small corner of the world.

People following their passion—for lyrics, squished pennies, or Canaan Dogs—build much of the content on the Web. This results in a growing archive of information—some good, some mediocre, some just plain wrong—on any topic that people get excited about, and no, I'm not just talking about porn sites.

I remember lamenting a few years ago the fact that the Web does not have much of a memory, that information from more than a year or two ago just drops off the edge and disappears. I can no longer complain about the Web's "memory," at least when it comes to information that resides in someone's head or attic. Consider, for a moment, Project Gutenberg. It's a great idea: get volunteers to type in or scan-to-OCR classic texts that have come into the public domain. Gradually over the course of 30 years, PG has built up an archive of over 6,000 books. That's an impressive number and the volunteer effort required is staggering.

But now consider the size of the rest of the "historical" information that is distributed throughout the Web. eBay has given every Tom, Dick, and Harriet an incentive to rummage through their closets, photograph and describe in loving detail their record collection, old telephone sets, or 1950s dinnerware, and make this information available digitally. And, as every dedicated Deadhead knows, there has long been an extensive network of people who trade tapes of Grateful Dead shows. Now the Web has given us the incentive to catalog our Dead tapes and set lists and put that information online. Anyone can find out how often the Dead played "Quinn the Eskimo" as an encore, or when they first performed "Reuben and Cérise."

While the Web has been text-rich for years, finding graphic images has been more difficult, particularly since search engines are not able to recognize the content of images, so their indexing is haphazard at best. But with the rise of Webcams, digital cameras, eBay and other electronic garage sales, and smarter search engine algorithms, it is now possible to identify royalty-free images of everything from the Eiffel Tower to ice crystals. And, as with Reba McEntire lyrics or Fiesta dinnerware, the content has been created by individuals loading the 15 or 20 images of something they care about, not by a dot.com eyeing a business opportunity. Yes, the content quality varies from authoritative to downright wrong, but here is where the popularity-ranking of search engines comes in handy. I may not know which Web site of original Fiestaware is authoritative and which one is hopelessly incomplete. All I need to know is that most relevance-ranking algorithms factor in the number of links to a given page, so I trust that people who really know Fiestaware (Fiesta-heads?) will have evaluated the dinnerware sites and have linked to the ones they consider most reliable.

When I talk with econtent providers about the impact of the increasing depth of the Web to their businesses, I often hear the same response: How can you make a business decision based on content that may be transitory, whose provenance you may not be able to determine, and which has not been vetted by an editor or publisher? For many information needs, though, all that a searcher is looking for is an answer, not necessarily the answer. I often need just enough information to reduce my uncertainty, which I can accomplish via the Web.

The more significant challenge to econtent companies is that this attitude of "I can find a good enough answer on the Web" often carries over into our more serious research. Many open Web users, including yours truly, can find something that is well-formatted, appears to be current, and—while perhaps not complete—at least satisfies an information need. For many econtent customers, Web results look more and more complete all the time. We are finding deep, rich repositories of information. We are finding current articles, images, other graphics, and access to databases of content never available before through fee-based information sources.

The value-added online services offer unique content, powerful search tools, the means to deliver highly targeted information directly to the information consumer who needs it, and deep archives that remain stable over time. Our job as info pros is to raise the information expectations of our clients and patrons, so that they, too, know when to use eBay and when to use the high-end information marketplace.