To Metadata or Not To Metadata

Page 1 of 3

      Bookmark and Share

To metatadata or not to metadata, that is the question.
Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous search results
Or to take up metadata against a sea of irrelevance
And by organizing them, find them?

With all due apologies to the Bard, the questions of whether to add metadata to unstructured content and how much effort is really justified to do so have been raised with increasing frequency and vigor in the last year.

These issues and more were explored last year at the Dublin Core Metadata Initiative (DCMI) 2003 Workshop. While some participants argued for a drastic reduction in metadata efforts or at least rethinking those efforts, other participants offered new ideas of how to create valuable metadata and how to generate value from metadata.

A couple of things have become increasingly clear: Metadata is not going away and there is no one simple solution to how to add metadata and maximize its value. Consequently, what we are going to do in this article is take a look at some of the basic issues around adding metadata to unstructured content and explore a range of approaches that various groups and software vendors are trying. We will then examine how a broader view of metadata, beyond simply adding keywords to documents, is leading to a more sophisticated, multi-dimensional or infrastructure-based approach to metadata that supports a smarter balance of both more and less metadata.

Too Hard, Too Much, No Help
A number of issues have been raised about the effectiveness and value of adding metadata. The first issue is the cost of adding metadata, and the second is the difficulty of doing it well and the associated problem that poor-quality metadata can actually make search worse than no metadata at all.

Let's start with the cost argument. One participant at DCMI, Mike Doane, senior content analyst SBI and Company, cited his practice in which he charged between $150,000 and $250,000 for a full-scale metadata implementation. This can certainly seem like an exorbitant amount of money especially for a company that is still using a $10,000 search engine for its intranet. In addition, this expense is just for adding metadata to a large existing content repository but doesn't take into account the additional cost of maintaining and adding new metadata.

In addition to cost, another argument against adding metadata is the immense difficulty of doing it well. From my own experience and that of others, the difficulty of effectively employing metadata can easily be seen in the abysmal quality of the metadata associated with the unstructured content found on most corporate intranets. In evaluating corporate intranets, time and again we find missing metadata fields, missing values from the fields that have been defined, very poor quality values in even such simple fields as the title (ex23a.pdf is not very illuminating as a document title), inconsistent values among similar documents, and inconsistent values among authors. One interesting aspect of bad metadata is that it doesn't just detract from getting full value from the effort to add metadata; if you tweak your search engine to use metadata like keywords in ranking, and someone puts in bad metadata, the document might be lost forever—or it won't turn up when someone searches for a relevant search term, but will turn up for an inappropriate search term—and promptly be ignored.

Given the rather pathetic record that many metadata efforts have racked up, it is little wonder that organizations have begun to question the entire value of adding metadata. However, there is another side to the story. First, the cost of adding metadata can be reduced in several ways. For example, the $200K for a metadata initiative performed by outside consultants can be greatly reduced by not starting from scratch in each case, but rather starting with existing metadata standards and controlled vocabularies and taxonomies. The cost of a unique custom job will always be higher than one that at least starts with predefined components.

In addition, the cost of doing metadata has to be weighed against the cost of not doing metadata. Assume for the moment that adding metadata would solve all of the problems associated with search. One estimate from IDC puts the cost of bad search at $6 million for a 1,000 person company. Now it is unlikely that adding metadata will solve all search problems, but even if it only solves half, that is still a savings of $3 million per year. In this context, $200,000 for metadata doesn't seem so exorbitant.

Page 1 of 3