To Metadata or Not To Metadata

Page 2 of 3

Now let's assume that you have decided that it is worthwhile to at least explore different approaches to adding metadata, how do you proceed? Three approaches are guaranteed to produce less than optimal results and cost benefits: One is to hire consultants, but this has a high upfront cost and an ongoing maintenance costs. Two is to ask your authors to create metadata as they publish, but this leads to very low quality metadata, especially keywords, which require a special skill that has nothing to do with subject matter expertise (not to mention the difficulty of getting them to actually add it at all). Three is to use automatic metadata generation software, but the software often costs as much a consultants and does a worse job.

At the DCMI 2003 Workshop, a different approach to navigating the metadata dilemma was discussed at some length—the content-value-tier model offered by information architecture expert Lou Rosenfeld. The idea was fairly simple: Focus on a practical solution, focus on high-value content, and don't try to solve all the world's problems. High-value content can be specified using a variety of criteria like authority, popularity, currency, strategic value, and reusability. Then you can choose to add full metadata to high value content and less or none to low value content.

Unfortunately, even this approach has its shortcomings. One problem is that it doesn't really solve the problem of how best to add good metadata; it simply tries to limit the problem. A second problem is that, in my experience, it creates a number of new problems, first of which is the political dimension. If you think that wars over placement on the home page can be vicious, trying to manage who gets metadata and who doesn't can be worse. And then there is the issue of who gets to decide what is of high value, which is another political minefield.

The use of relatively objective measures can help, but such measures have poor track records themselves. For example, if you use Weblog usage statistics, you might find that the mass transit schedule is number one, but the document that someone needs to create a template for a critical proposal might be ranked near the bottom even though the proposal's for a $100 million project. It should be pointed out that attaching determination of the value of content based on criteria like authority, popularity, and the like is adding metadata to content. It just utilizes different metadata fields and applies metadata to collections instead of documents.

While I don't believe that this model provides an ideal solution, it does point in the right direction. It is based on looking at and differentiating content, and it uses multiple approaches such as a set of criteria for high value content. Finally, it is a possible, practical solution in certain cases, particularly when developed within an articulated strategic vision.

Intellectual Infrastructure
The first step in finding the right solution, or rather, the right set of solutions, is to examine the issue of metadata within a broad context of information and knowledge needs, or what I call the intellectual infrastructure context. It is important to look at metadata within this broad context to enable the full set of answers to how to add metadata and how to utilize metadata. In some cases, that might mean less metadata, but in others it will mean more. By viewing metadata as an add-on to a search engine project, you are essentially guaranteeing that you won't come up with the best set of solutions.

This intellectual infrastructure includes all kinds of content—structured and unstructured, internal and external, document-based and tacit knowledge inside the heads of employees. It includes metadata, taxonomies, controlled vocabularies, database schemas, persona models, and other knowledge organization structures. It also includes the publishing policies and procedures as well as the people who develop and support the creation and utilization of all the kinds of content. And finally, it includes information technologies like search engines, content management, portals, categorization and visualization software, and other applications that information and knowledge workers routinely use.

For example, content management is an essential part of any attempt to add metadata. Good content management software can support the integration of your metadata standards and, more importantly, controlled vocabularies. Good content management can also support various automation and workflow capabilities that can be used to increase the quality of metadata and decrease the cost.

Another component of an infrastructure approach to metadata is the makeup and services of a central team of—yes—people. This team should be a cross-organizational team with library science well represented, but also business analysts, user-focused individuals (anything from usability people to cultural anthropologists), and software specialists. This team can perform a number of functions that will lead to better and cheaper metadata. First they would be in charge of creating, acquiring, and evaluating taxonomies, metadata standards, and controlled vocabularies. This team would also research metadata theory like the latest RDF proposals.

Another function of the team would be to work with authors, evaluating metadata quality and methods for facilitating author-created metadata on the one hand, and analyzing the results of using metadata, tracking how the enterprise's communities were utilizing the metadata, on the other. Finally, this central team would also perform an essential, but often overlooked, role: socializing the benefits of metadata and helping to create a content and user-centric culture to replace the technology-centric culture too often found in information groups.

Page 2 of 3