Metadata—Not Just for Librarians Anymore

Apr 11, 2005

May 2005 Issue

      Bookmark and Share

In January this year I found myself in the wonderful city of Brussels. The reason for the trip was that I had been invited to act as an "expert" at a workshop that had been set up by the European Committee on Standardization (CEN) to discuss aspects of the use of the Dublin Core metadata scheme in the corporate sector. You may find this difficult to believe, but the attraction in attending was not the gastronomic delights of Brussels or the vanity of being an expert, but rather an increasing interest in metadata in the corporate sector.

I'm not going to describe the conference because the presentations are available online (ftp://ftp.cenorm.be/public/ws-mmidc/mmidc140.htm). Suffice it to say the event was both successful (in large part due to the expert chairmanship of Taxonomy Strategies' Joseph Busch) and unsuccessful because we still are far from achieving any sort of agreement about how to apply the Dublin Core model in the corporate sector.

The implementation of content management software has brought metadata to center stage, but there is not much of an audience out there. This is understandable because it takes someone with the presenting skills and wit of Gerry McGovern (a fellow expert) to make metadata not only interesting, but funny. Rarely am I able to have a sensible conversation with anyone outside of the library profession about the importance of metadata.

To gain a bit of hands-on metadata experience, a good place to start is the Properties box in Microsoft Office. Few people know it exists, and even those who do rarely bother to fill it in. I have yet to come across any corporate management policy from one of my clients that explains how useful the Properties box can be and makes it mandatory to fill in. Yet these same clients are (somewhat mysteriously) in total agreement that staff will have no problems adding metadata to documents when they use the newly installed CMS. The reality is that most organizations have little understanding of the nature and value of metadata, the cost of adding it to documents, and the impact of not being able to locate documents when metadata is poorly managed.

One common fallacy about metadata is that it is just adding keywords to documents, which is perpetuated by CMS vendors who should know better than to provide metadata boxes in templates labeled "Keywords." In my view there are four broad categories of metadata: Structural metadata describes the information architecture of the document. These metadata elements might include Title, Summary, Image, etc. Content metadata provides a way of identifying documents that may contain relevant subject information. This is usually what most people think about when the word metadata is mentioned. Descriptive metadata enables the type of document to be identified. In this way a search could be limited to Web content, streaming video, etc. Finally, Administrative metadata deals with the person and department owing the document, the date the document would be checked for relevancy, and the language of the document.

One of the problems we face is that there seems to be no consistency in categorizing and describing metadata; as yet, there is no Metadata for Dummies book. The situation is not helped by publications such as "Understanding Metadata" from the U.S. National Information Standards Organization (www.niso.org). I am not at all sure what audience NISO was writing for. To quote from this 20-page document: "An extension is the addition of elements to an already developed scheme to support the description of an information resource of a particular type or subject or to meet the needs of a particular interest group. Extensions increase the number of elements. Profiles can constrain the number of elements that will be used, refine element definitions to describe the specific types of resources more accurately, and specify values that an element can take." I find that about as clear as mud.

The basic concept of content management software is that unstructured text can be managed in a database. However, that database is only as good as the metadata that is added to the content in the first place. There is a really important educational task to be carried out here, and I'm not sure who should be taking the lead. Congratulations to CEN for at least lighting the torch; but who is going to carry it? There are few papers at conferences on the subject, and some books, though most are targeted at the information profession. I recommend Metadata for Information Management and Retrieval by David Haynes (2004, Facet Publishing).

Of course the really sad thing is that the people who have significant expertise in the area of metadata are the library and information science community, but they are about the last people anyone in IT speaks to before they buy a CMS, much less look to in crafting a metadata policy or strategy.