Why Taxonomies Need XML

Page 3 of 3

      Bookmark and Share

BEST PRACTICES SERIES

Semantic Enablement
Ontology may be the latest double-point word in buzzword Scrabble, but as the semantic web becomes a reality, taxonomy builders will need to pay attention. As we sort through candidate terms and marshal keywords into a taxonomy, it is easy to lose sight of the fact that what we are actually building is not just a list of preferred terms but a Knowledge Organization System (KOS). We are attempting to create a way to efficiently communicate information and knowledge across an organization and beyond.

Jan Van Eman, CEO of Access Innovations-Data Harmony, is quick to point out that we should not lose sight of that goal. "The markup itself does not necessarily provide any inherent value beyond the ease with which it can be read by machines or humans. XML does, however, provide attributes and entities which can make it easier to add intelligence about an element." As Eman correctly points out, that intelligence involves more than explicitly capturing the parent-child relationships in a standard taxonomy. "Within ontology, one can also say a great deal more about the knowledge representation system and the objects being represented," says Eman.

An ontology is essentially a model of a subject that includes not only things but the relationships among those things and the nature of those relationships. A sophisticated taxonomy might know that two things are related but not necessarily how; an ontology will capture that intelligence. As a result, knowledge and information that the content authors didn't think of and isn't explicitly present in the model can be inferred. When content is created according to such a model it becomes semantically enabled. As a result it can begin to automatically adapt to its audience.

This turns the relationship between information and audience on its head. "The paradigm has always been to put the onus and responsibility for finding the content on the user, until now," says Ken Berger, director of account management for SiberLogic. "Because we built the model, we can take into account whatever is in the head of the user when they sit down. Are they thinking about a component, as opposed to a function, as opposed to a user, as opposed to a whatever? They can come in and based on what they are thinking about I can navigate according to that perspective and get to my content at will." SiberLogic estimates that this reduces content development effort by as much as 50%. "That's because the content doesn't change," Berger says, "just the method of navigation, and the model can drive that change automatically."

But here pure XML begins to break down. Because it is intended to represent structure and not meaning, XML has no inherent semantics. Berger explains, "XML has helped us a great deal in coming from an unstructured to a structured world and we now understand the difference between a paragraph, title, and chapter, but what is between the tags is just text."

This sort of informal knowledge is the Achilles' heel of knowledge organization systems.

To address this gap, knowledge modelers are turning to the Resource Description Framework (RDF). RDF is essentially a metadata framework that captures semantic relationships in a standard, machine-understandable way of using XML syntax as its interchange mechanism. Their natures as a generic framework make it very flexible, but also leaves it open to the same vulnerabilities as XML when it comes to competing implementations. Without a common implementation we are again left with writing adapters and transformations, only this time around they will be much more complex.

Fortunately, a standard has emerged in the form of the Simple Knowledge Organization System (SKOS). SKOS is an application of RDF that is primarily intended for declaring and publishing networks of topics in a machine-understandable form. As such it is a first step toward formal ontologies for the semantic web. The next step is the Web Ontology Language (OWL) which extends the semantic capabilities of RDF. Both of these technologies may rightfully be considered as emergent and many practitioners are taking a cautious wait-and-see attitude. The semantic web is coming, however, and structuring taxonomies with a well-defined and understood mechanism like XML will position them as building blocks for any ontologies that are to emerge.


Companies Featured

Access Innovations-Data Harmony
www.accessinn.com 

Ektron 
www.ektron.com 

Endeca 
www.endeca.com

Factiva 
www.factiva.com 

SiberLogic 
www.siberlogic.com 

Taxonomy Strategies 
www.taxonomystrategies.com

Page 3 of 3