It’s TAXonomy Time

In April, when this column is printed, Americans will be engaged in the annual ritual of calculating and paying state and federal taxes. Although tax activity is most intense in April, we all pay taxes daily in many forms, such as sales tax. Just as taxes have always been with us, so too have taxonomies. Like dealing with taxes, taxonomy management is an evolving process that never ends. There are many definitions of “taxonomy,” but I view them as merely the various ways we categorize and manage groups of things so we can find them, whether they’re dishes in a cupboard or scrolls in the ancient library of Alexandria.

On several recent projects in different businesses sectors, I’ve heard “taxonomy” proposed as part of a hoped-for solution to counter the collective sigh about the cost and difficulty of finding electronic files. I recently asked members of a taxonomy discussion group whether they believe interest in taxonomies is growing, and to a person, they all insisted it was. Information proliferation has triggered this interest, but how do you proceed?

Continuing the tax analogy: Years ago I calculated my own taxes; life and finances were much simpler. As the tax code grew, tax preparation became more complicated. I had to develop a continuous process for collecting and sorting paper receipts. At some point, the pile of receipts grew so big and the tax code became so complicated that I gave up and started paying someone to prepare my taxes. Not only was that expensive but I also couldn’t be certain I was minimizing my tax payments, and I still felt like I was doing most of the work. I began to consider using tax software; maybe that’s what the tax advisor was using anyway, so why not cut out the middleman? My tax problem then grew from a continuous activity of saving receipts to using a spreadsheet to categorize them. From there, I graduated to TurboTax to crank out the taxes and other financial software for long-term planning. I picked TurboTax because in effect it was “integrated” with my broker, being able to import annual brokerage statements. The financial management problem is better, but it still requires continuous effort.

And so goes the evolution of taxonomy efforts. Substitute network drives for piles of paper receipts. Move up to content management systems, and you now need to plan and manage metadata and folder structures (taxonomies, in effect). Most planning starts with spreadsheets, yet these efforts alone do not consider the broader need for a planned and governed categorization process, covering the subtleties of different views of findability: social tagging, synonym development, and graphical views of hierarchical categories expressed as folder “trees.” Those taxonomy trees keep growing too, and they need to be pruned and managed continuously.

Taxonomy project teams often look to specialized tools as a panacea. However, taxonomy software is growing more sophisticated and expensive, and few programs promise that the effort you’ve invested in spreadsheets can be reused. Unlike TurboTax, which can import annual brokerage statements, taxonomy tools sometimes advertise “text” import or XML, which is like asking what language a book is written in and being told “text and numbers.”

To confirm my belief about taxonomy trends, I spoke with representatives of two leading taxonomy consulting and product vendors, Earley & Associates and SchemaLogic. I spoke with Seth Earley, founder and president of Earley & Associates (whose firm purchased the taxonomy tool Wordmap), and Carol Hert, Ph.D., chief taxonomist and consultant for SchemaLogic, Inc. Details are on my blog (, but two important points emerged: First, taxonomy tools are at best only part of the solution. Earley, who I’d expected to be touting Wordmap, said, “Most of the time you don’t need a tool; you can manage with spreadsheets. Tools are useful (such as for distributed authoring), but most organizations aren’t sufficiently mature for them. We’ve been in the industry for a long time; we’ve had Wordmap for 2 years. But organizations need to solve other problems such as governance first.” Secondly, as with my tax tool, integration with what you already have is critical. Hert said that taxonomy projects must plan to “publish those taxonomies to multiple applications such as SharePoint and Documentum as well as to search engines like FAST or autoclassification systems such as Teragram.” Both Earley and Hert said their tools can use CSV exports from spreadsheets.

Sadly, just as there is no going back to simpler practices in tax preparation, there is no panacea for managing information overload. Yet, like death and taxes, taxonomies are inevitable.