Metadata--Think outside the docs!

May 03, 2005

In the age of the intelligent search engine, the importance of metadata is called into question. It seems that Google can find everything we need by sending its robots to crawl around inside all of our documents. Why bother with the hard work of categorizing, classifying, and tagging each document with metadata that's stored outside the document in a database, or worse, buried in XML/RDF tag attributes in a stored version of the document that is rarely served as is, so the expensive metadata is never seen by today's search engines?

The world of librarians (now repositioned as Information Architects) keeps telling us that their categorizing skills are critically important to organizing information as knowledge. In their terminology, alphabetical subject lists (like the Library of Congress Subject Headings) and classification schemes (like the Dewey Decimal Classification) allow for "precoordinate" indexing of all the world's documents. Precoordinate means the search strings these library experts use to help us find what we are looking for are prepared ahead of time by teams of experts.

But in the age of Google our search strings are called "postcoordinate." We are all do-it-yourself reference librarians. This means we assemble our search strings as we think of them and query Google to see what comes back. If we're clever we use advanced search techniques and combine search terms with their version of Boolean logic. Mercury planet -car -element -god.

With all kinds of studies showing that postcoordinate searching is retrieving the right information for 80 to 90% of users, it often seems superfluous to invest the kind of money necessary to tag our docs with metadata to reach those last few customers. If you're a $100 million business, adding a few percent to revenues handily covers the cost of the really fancy taxonomy and metadata strategies. But even if you are a small operation, improving your clients' access to the information they need translates into customer satisfaction and quality of service. That may let you keep the business you have before an off-shore competitor spirits it away with a faster, better, metadata-enabled Web site.

MARC, Please Meet your Party on the Web
Metadata lets intelligent computer programs find the meaning of your content, beyond that discoverable by examining the documents themselves. Librarians called it the machine-readable catalog (MARC). Tim Berners-Lee calls it the Semantic Web. The question being asked by the financial types is when will commonly available Web tools exploit that extra meaning to deliver better information to your audience?

The short answer is: not soon enough to provide a measurable ROI, unless your Web site and intranet provide custom retrieval and navigation tools for your users. Don't invest in a huge metadata design and implementation unless you invest a comparable amount in your own search engine, or in a sophisticated adaptive navigation scheme that exploits the costly metadata with a user interface that your customers actually use.

The good news is that these tools can provide measurable results if they include complete logging of all the search and navigation efforts by your users. The bad news is that looking at the performance metrics may reveal that virtually no one uses your fancy new tools.

DIY Metadata
So what about the relatively inexpensive metadata provided for in <meta> the header element in every HTML page? Well-motivated Web page designers have augmented the visible document part of millions of Web pages (the stuff between the <body> tags is the real content of your document) with invisible <meta> information like keywords and descriptions.

You could use the <meta> tag to easily implement one of the most important but overlooked uses for metadata: search term explosion. To do this, you create a synonym ring, a list of terms that are essentially equivalents of the terms explicitly in your document, including abbreviations, acronyms, and even misspellings. When your visitor types in "tilenol," your metadata check tells the search engine to serve the page with Tylenol on it.

Easy to do and incredibly valuable when it retrieves what your visitors are looking for. Without it, you simply lose the business. But again, this metadata will be wasted unless you are in control of your search technology. Adding synonyms to your meta name="keywords" tag does not help with most public search engines. The problem is abuse of the <meta> tag by aggressive Web-page designers misrepresenting the contents of their pages to improve their search engine positioning. They have poisoned the metadata well.

Potentially poisoned or not, the bottom line is that metadata management is an integral part of sophisticated content management, but only if you control the complete user experience.