Playing with Taxonomies


      Bookmark and Share

BEST PRACTICES SERIES

Unless you're a biologist or librarian, "taxonomy" is a term so esoteric that it makes your eyes roll. Yet redesign your intranet or discuss using keywords to tag content for easier searching, and suddenly the concept, if not the term, raises everybody's blood pressure. Like the old joke, there are as many opinions about the meaning of taxonomy as there are information stakeholders. Let me start by giving you my definition: Taxonomy is a logical organization of information categories. By "logical" I mean high-level, not the opposite of "illogical." Keywords and navigation schemes are some of the physical manifestations of taxonomies. The logical/physical distinction is somewhat like a prototype versus retail products or XML schemas versus instances.

Certain professional groups and verticals have developed standard taxonomies for specific subject areas. MeSH, for example, categorizes medical subject headings. Unlike MeSH (Medical Subject Headings), which is emotionally neutral, we want to tag our own content in our own way and resist mightily any group imposing its views on our content. That's the problem with taxonomies. All politics is local, and the same holds true for many taxonomies, at least when they attempt to capture deep-seated ways of looking at information close by. In a recent D.C. meeting about keywords, I asked participants to name the national liberal newspaper read by most Washingtonians. Answers I received were "Washington Post," "Post," "Washington Times" (a real surprise), and "The Washington Post." Only the last response was correct. What keyword would you propose to tag content from that newspaper? Or—more importantly—which one would you use to search for articles sourced in The Washington Post? Pick the wrong keywords or spell them with the wrong case and you might find nothing. Make a keyword too many characters in length and people might refuse to use it. I found similar problems with departments tagging their HTML pages on an intranet. My suggestion in both instances was to purge the keywords on the Intranet and not begin tagging company content until we had a clear idea how we would develop, define, and use the taxonomies.

Most agree that taxonomies are critical to good information systems, but how can we resolve the emotion and inconsistency inherent in local taxonomies? I suggest that one way is to play. Video games raised awareness and helped fund new generations of graphic hardware and software; Napster demonstrated the power of P2P technology. A similar trend is rising with taxonomies. Several Web sites are now being built to enhance social activities in a way that demands the use of socially developed taxonomies. Del.icio.us and Flickr represent two such efforts, with large groups of people describing their content in a way that they all can share. Del.icio.us describes itself as a "social bookmarks manager" and "very very pre-alpha." It is designed to let participants share Web bookmarks. Flickr offers online photo sharing and is much further along. In both sites, finding information demands some agreed-upon, dynamic way of classifying content and changing that classification as the content grows exponentially. The Del.icio.us hook is to let you see links that others have collected and who else has bookmarked a specific site. You can also view the links collected by others, and subscribe to the links of people whose lists you find interesting. The use of good links, like good keywords, grows. Feedback is immediate by letting you see how large the clusters of others agree (use) or disagree (don't use) your tags.

I spoke with Stewart Butterfield of Ludicorp, developer of Flickr, about this effort. (Latin scholars note: The Ludi in "Ludicorp" suggests the Latin words for play and game.) Stewart had many insights about this new approach to building taxonomies. "If you can hire enough excellent librarians, you will get better keyword results than with social approaches. However, as the content grows, tagging (and retagging) becomes an order of magnitude more difficult. In other words, social approaches are 80% as good as and 10 times easier than top-down approaches." As to whether Flickr's approach would work in the button-down corporate world, Butterfield had this to say: "Anticipate resistance in the CIO crowd who don't want to risk losing control in a social self-correcting process and do not want anything to get lost." Butterfield says that at least 55% of photos uploaded to Flickr have one or more tags, and 66% have both a tag and user-supplied metadata. As of early September, Flickr had 500,000 photos on its site and was growing at the rate of 15,000 to 20,000 more each day.

Would CIOs resist this kind of enthusiastic, collaborative taxonomy use? I wonder, especially if the tools vendors took note of these social, playful approaches to building taxonomies.