Why Taxonomies Need XML

Page 1 of 3

      Bookmark and Share

Taxonomies are everywhere these days. They organize our content management systems. They drive navigation through our websites. They even rewrite the queries we submit to search engines. Even more ubiquitous is XML, the acknowledged lingua franca of the Information Age. The concurrent ascendancy of taxonomies and XML isn't surprising. Taxonomies provide common terminology for the information we wish to communicate and XML provides a "human-legible and reasonably clear" (according to the standard, at least) way to structure and exchange that information. Oddly though, regardless of their complementary nature, the two technologies are rarely in the same place at the same time.

Despite the explosion of the taxonomy software and services industry over the past several years (it has been growing at an average annual rate of 21% since 2002), the vast majority of taxonomies are still created and maintained in Microsoft Excel. Spreadsheets are easy to create and nearly everyone has a copy of Excel, so sharing a taxonomy file with colleagues is simple. But while shuffling terms from column to column may come as second nature to even a novice taxonomist, problems arise once the taxonomy is complete and ready to be used. "Once you have created a taxonomy, the question is how do you move that taxonomy around? How do you have it in one system and export it to another system?" asks Ed Rogers, CEO of CMS vendor Ektron. 

Most will answer this question by exporting the vocabulary to a simple comma-delimited file. This approach worked just fine when it was only people using the taxonomy with maybe one application that actually needed to ingest it. Things are not that simple anymore. Most enterprise infrastructures now consist of multiple applications that must interact and need to understand a common vocabulary to do so. Unfortunately, each one of those systems will have its own way of looking at a comma-delimited taxonomy file. A content management system may look at the word in the second column as a child of the term in the first column of the same row while your search engine looks at the row above for that relationship. The categorization engine may not even look for terms until it gets to the fourth or fifth column of the file. The taxonomy must be shared in a way that retains the role or function of each term in relation to the entire vocabulary. "The only way to really do that is through some structured format," says Rogers, "and that structured format is XML." 

Encapsulating a taxonomy in XML enables you to explicitly represent its structure in a way that any XML-enabled tool can understand unambiguously. The "extensible" in eXtensible Markup Language ensures that you can define the exact structure you need for the task at hand and communicate it clearly. Wrapping a keyword with , , or tags and pointing the application at an appropriate schema or DTD will go a long way toward eliminating inconsistencies across systems and misunderstanding among users. "The only time you don't really need to care about keeping taxonomies in XML is if you are not going to be using that taxonomy across any other system," Rogers sums up. "Ektron's philosophy is to always keep it in an XML format." 

Page 1 of 3